Product Title Summarization(PTS) Corpus


Introduction

Each line in the corpora consists of a pair of titles (original title, short title), their brands, and commodity names. Each line is tab-delimited (two tabs) with the following format:

<original title>\t\t<short title>\t\t<brand>\t\t<commodity name>

File

Note: may contain multi-language versions(separated using “/”) for some products, e.g., Nintendo/任天堂.

Paper

Note: There are some mistakes in the dataset statistics in the paper of ACM DL version. These have been fixed.

Fei Sun, Peng Jiang, Hanxiao Sun, Changhua Pei, Wenwu Ou, and Xiaobo Wang. Multi-Source Pointer Network for Product Title Summarization. In Proceedings of 2018 International Conference on Information and Knowledge Management.

Citation

@inproceedings{Fei:Multi,
author = {Fei Sun and Peng Jiang and Hanxiao Sun and Changhua Pei and Wenwu Ou and Xiaobo Wang},
title = {Multi-Source Pointer Network for Product Title Summarization},
booktitle = {Proceedings of CIKM},
year = {2018},
publisher = {ACM},
location = {Turin, Italy}
}