Product Title Summarization(PTS) Corpus


Each line in the corpora consists of a pair of titles (original title, short title), their brands, and commodity names. Each line is tab-delimited (two tabs) with the following format:

<original title>\t\t<short title>\t\t<brand>\t\t<commodity name>


Note: may contain multi-language versions(separated using “/”) for some products, e.g., Nintendo/任天堂.


Note: There are some mistakes in the dataset statistics in the paper of ACM DL version. These have been fixed.

Fei Sun, Peng Jiang, Hanxiao Sun, Changhua Pei, Wenwu Ou, and Xiaobo Wang. Multi-Source Pointer Network for Product Title Summarization. In Proceedings of 2018 International Conference on Information and Knowledge Management.


