Selected Publications
An up-to-date list is available on Google Scholar.
2025
- Personalized Denoising Implicit Feedback for Robust Recommender SystemKaike Zhang , Qi Cao , Yunfan Wu , Fei Sun , Huawei Shen , and Xueqi ChengIn WebConf, May 2025
@inproceedings{zhang-2025-fall, title = {Personalized Denoising Implicit Feedback for Robust Recommender System}, author = {Zhang, Kaike and Cao, Qi and Wu, Yunfan and Sun, Fei and Shen, Huawei and Cheng, Xueqi}, booktitle = {WebConf}, month = may, year = {2025} }
WebConf
2024
- The Fall of ROME: Understanding the Collapse of LLMs in Model EditingWanli Yang , Fei Sun , Jiajun Tan , Xinyu Ma , Du Su , Dawei Yin , and Huawei ShenIn Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024
Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that contribute to the collapse: i) inconsistent handling of prefixed and unprefixed keys in the parameter update equation may result in very small denominators, causing excessively large parameter updates; ii) the subject of collapse cases is usually the first token, whose unprefixed key distribution significantly differs from the prefixed key distribution in autoregressive transformers, causing the aforementioned issue to materialize. To validate our findings, we propose a simple yet effective approach: uniformly using prefixed keys during editing phase and adding prefixes during testing phase to ensure the consistency between training and testing. The experimental results show that the proposed solution can prevent model collapse while maintaining the effectiveness of the edits.
@inproceedings{yang-etal-2024-fall, title = {The Fall of {ROME}: Understanding the Collapse of {LLM}s in Model Editing}, author = {Yang, Wanli and Sun, Fei and Tan, Jiajun and Ma, Xinyu and Su, Du and Yin, Dawei and Shen, Huawei}, editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2024}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.findings-emnlp.236/}, doi = {10.18653/v1/2024.findings-emnlp.236}, pages = {4079--4087} }
- Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts?Hexiang Tan , Fei Sun , Wanli Yang , Yuanzhuo Wang , Qi Cao , and Xueqi ChengIn Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024
While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources.To investigate this, we formulate a systematic framework to identify whether LLMs’ responses are attributed to either generated or retrieved contexts.To easily trace the origin of the response, we construct datasets with conflicting contexts, i.e., each question is paired with both generated and retrieved contexts, yet only one of them contains the correct answer.Our experiments reveal a significant bias in several LLMs (GPT-4/3.5 and Llama2) to favor generated contexts, even when they provide incorrect information.We further identify two key factors contributing to this bias: i) contexts generated by LLMs typically show greater similarity to the questions, increasing their likelihood of being selected; ii) the segmentation process used in retrieved contexts disrupts their completeness, thereby hindering their full utilization in LLMs.Our analysis enhances the understanding of how LLMs merge diverse contexts, offers valuable insights for advancing current LLM augmentation methods, and highlights the risk of generated misinformation for retrieval-augmented LLMs.
@inproceedings{tan-etal-2024-blinded, title = {Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts When Knowledge Conflicts?}, author = {Tan, Hexiang and Sun, Fei and Yang, Wanli and Wang, Yuanzhuo and Cao, Qi and Cheng, Xueqi}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = aug, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.acl-long.337/}, doi = {10.18653/v1/2024.acl-long.337}, pages = {6207--6227} }
- The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models CollapseWanli Yang , Fei Sun , Xinyu Ma , Xun Liu , Dawei Yin , and Xueqi ChengIn Findings of the Association for Computational Linguistics: ACL 2024, Aug 2024
Although model editing has shown promise in revising knowledge in Large Language Models (LLMs), its impact on the inherent capabilities of LLMs is often overlooked. In this work, we reveal a critical phenomenon: even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. However, benchmarking LLMs after each edit, while necessary to prevent such collapses, is impractically time-consuming and resource-intensive. To mitigate this, we propose using perplexity as a surrogate metric, validated by extensive experiments demonstrating changes in an edited model‘s perplexity are strongly correlated with its downstream task performances. We further conduct an in-depth study on sequential editing, a practical setting for real-world scenarios, across various editing methods and LLMs, focusing on hard cases from our previous single edit studies. The results indicate that nearly all examined editing methods result in model collapse after only few edits. To facilitate further research, we have utilized GPT-3.5 to develop a new dataset, HardEdit, based on those hard cases. This dataset aims to establish the foundation for pioneering research in reliable model editing and the mechanisms underlying editing-induced model collapse. We hope this work can draw the community‘s attention to the potential risks inherent in model editing practices.
@inproceedings{yang-etal-2024-butterfly, title = {The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse}, author = {Yang, Wanli and Sun, Fei and Ma, Xinyu and Liu, Xun and Yin, Dawei and Cheng, Xueqi}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2024}, month = aug, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.findings-acl.322/}, doi = {10.18653/v1/2024.findings-acl.322}, pages = {5419--5437} }
- When to Trust LLMs: Aligning Confidence with Response QualityShuchang Tao , Liuyi Yao , Hanxing Ding , Yuexiang Xie , Qi Cao , Fei Sun , Jinyang Gao , Huawei Shen, and 1 more authorIn Findings of the Association for Computational Linguistics: ACL 2024, Aug 2024
Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective guidance. To address this, we propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD), which leverages reinforcement learning guided by a tailored dual-component reward function. This function integrates quality reward and order-preserving alignment reward functions. Specifically, the order-preserving reward incentivizes the model to verbalize greater confidence for responses of higher quality to align the order of confidence and quality. Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy, without causing over-cautious. Furthermore, the aligned confidence provided by CONQORD informs when to trust LLMs, and acts as a determinant for initiating the retrieval process of external knowledge. Aligning confidence with response quality ensures more transparent and reliable responses, providing better trustworthiness.
@inproceedings{tao-etal-2024-trust, title = {When to Trust {LLM}s: Aligning Confidence with Response Quality}, author = {Tao, Shuchang and Yao, Liuyi and Ding, Hanxing and Xie, Yuexiang and Cao, Qi and Sun, Fei and Gao, Jinyang and Shen, Huawei and Ding, Bolin}, editor = {Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2024}, month = aug, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.findings-acl.357/}, doi = {10.18653/v1/2024.findings-acl.357}, pages = {5984--5996} }
- Unlink to Unlearn: Simplifying Edge Unlearning in GNNsJiajun Tan , Fei Sun , Ruichen Qiu , Du Su , and Huawei ShenIn Companion Proceedings of the ACM Web Conference 2024, Singapore, Singapore, Aug 2024
As concerns over data privacy intensify, unlearning in Graph Neural Networks (GNNs) has emerged as a prominent research frontier in academia. This concept is pivotal in enforcing the right to be forgotten, which entails the selective removal of specific data from trained GNNs upon user request. Our research focuses on edge unlearning, a process of particular relevance to real-world applications. Current state-of-the-art approaches like GNNDelete can eliminate the influence of specific edges yet suffer from over-forgetting, which means the unlearning process inadvertently removes excessive information beyond needed, leading to a significant performance decline for remaining edges. Our analysis identifies the loss functions of GNNDelete as the primary source of over-forgetting and also suggests that loss functions may be redundant for effective edge unlearning. Building on these insights, we simplify GNNDelete to develop Unlink to Unlearn (UtU), a novel method that facilitates unlearning exclusively through unlinking the forget edges from graph structure. Our extensive experiments demonstrate that UtU delivers privacy protection on par with that of a retrained model while preserving high accuracy in downstream tasks, by upholding over 97.3% of the retrained model’s privacy protection capabilities and 99.8% of its link prediction accuracy. Meanwhile, UtU requires only constant computational demands, underscoring its advantage as a highly lightweight and practical edge unlearning solution.
@inproceedings{tan-etal-2024-Unlink, author = {Tan, Jiajun and Sun, Fei and Qiu, Ruichen and Su, Du and Shen, Huawei}, title = {Unlink to Unlearn: Simplifying Edge Unlearning in GNNs}, year = {2024}, isbn = {9798400701726}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3589335.3651578}, doi = {10.1145/3589335.3651578}, booktitle = {Companion Proceedings of the ACM Web Conference 2024}, pages = {489–492}, numpages = {4}, keywords = {graph neural networks, machine unlearning, over-forgetting}, location = {Singapore, Singapore}, series = {WWW '24} }
- Accelerating the Surrogate Retraining for Poisoning Attacks against Recommender SystemsYunfan Wu , Qi Cao , Shuchang Tao , Kaike Zhang , Fei Sun , and Huawei ShenIn Proceedings of the 18th ACM Conference on Recommender Systems, Bari, Italy, Aug 2024
Recent studies have demonstrated the vulnerability of recommender systems to data poisoning attacks, where adversaries inject carefully crafted fake user interactions into the training data of recommenders to promote target items. Current attack methods involve iteratively retraining a surrogate recommender on the poisoned data with the latest fake users to optimize the attack. However, this repetitive retraining is highly time-consuming, hindering the efficient assessment and optimization of fake users. To mitigate this computational bottleneck and develop a more effective attack in an affordable time, we analyze the retraining process and find that a change in the representation of one user/item will cause a cascading effect through the user-item interaction graph. Under theoretical guidance, we introduce Gradient Passing (GP), a novel technique that explicitly passes gradients between interacted user-item pairs during backpropagation, thereby approximating the cascading effect and accelerating retraining. With just a single update, GP can achieve effects comparable to multiple original training iterations. Under the same number of retraining epochs, GP enables a closer approximation of the surrogate recommender to the victim. This more accurate approximation provides better guidance for optimizing fake users, ultimately leading to enhanced data poisoning attacks. Extensive experiments on real-world datasets demonstrate the efficiency and effectiveness of our proposed GP.
@inproceedings{wu-2024-attack, author = {Wu, Yunfan and Cao, Qi and Tao, Shuchang and Zhang, Kaike and Sun, Fei and Shen, Huawei}, title = {Accelerating the Surrogate Retraining for Poisoning Attacks against Recommender Systems}, year = {2024}, isbn = {9798400705052}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3640457.3688148}, doi = {10.1145/3640457.3688148}, booktitle = {Proceedings of the 18th ACM Conference on Recommender Systems}, pages = {701–711}, numpages = {11}, keywords = {Adversarial Learning, Poisoning Attacks, Recommender Systems}, location = {Bari, Italy}, series = {RecSys '24} }
- Improving the Shortest Plank: Vulnerability-Aware Adversarial Training for Robust Recommender SystemKaike Zhang , Qi Cao , Yunfan Wu , Fei Sun , Huawei Shen , and Xueqi ChengIn Proceedings of the 18th ACM Conference on Recommender Systems, Bari, Italy, Aug 2024
Recommender systems play a pivotal role in mitigating information overload in various fields. Nonetheless, the inherent openness of these systems introduces vulnerabilities, allowing attackers to insert fake users into the system’s training data to skew the exposure of certain items, known as poisoning attacks. Adversarial training has emerged as a notable defense mechanism against such poisoning attacks within recommender systems. Existing adversarial training methods apply perturbations of the same magnitude across all users to enhance system robustness against attacks. Yet, in reality, we find that attacks often affect only a subset of users who are vulnerable. These perturbations of indiscriminate magnitude make it difficult to balance effective protection for vulnerable users without degrading recommendation quality for those who are not affected. To address this issue, our research delves into understanding user vulnerability. Considering that poisoning attacks pollute the training data, we note that the higher degree to which a recommender system fits users’ training data correlates with an increased likelihood of users incorporating attack information, indicating their vulnerability. Leveraging these insights, we introduce the Vulnerability-aware Adversarial Training (VAT), designed to defend against poisoning attacks in recommender systems. VAT employs a novel vulnerability-aware function to estimate users’ vulnerability based on the degree to which the system fits them. Guided by this estimation, VAT applies perturbations of adaptive magnitude to each user, not only reducing the success ratio of attacks but also preserving, and potentially enhancing, the quality of recommendations. Comprehensive experiments confirm VAT’s superior defensive capabilities across different recommendation models and against various types of attacks.
@inproceedings{zhang-2024-adv, author = {Zhang, Kaike and Cao, Qi and Wu, Yunfan and Sun, Fei and Shen, Huawei and Cheng, Xueqi}, title = {Improving the Shortest Plank: Vulnerability-Aware Adversarial Training for Robust Recommender System}, year = {2024}, isbn = {9798400705052}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3640457.3688120}, doi = {10.1145/3640457.3688120}, booktitle = {Proceedings of the 18th ACM Conference on Recommender Systems}, pages = {680–689}, numpages = {10}, keywords = {Adversarial Training, Poisoning Attack, Robust Recommender System}, location = {Bari, Italy}, series = {RecSys '24} }
- LoRec: Combating Poisons with Large Language Model for Robust Sequential RecommendationKaike Zhang , Qi Cao , Yunfan Wu , Fei Sun , Huawei Shen , and Xueqi ChengIn Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington DC, USA, Aug 2024
Sequential recommender systems stand out for their ability to capture users’ dynamic interests and the patterns of item transitions. However, the inherent openness of sequential recommender systems renders them vulnerable to poisoning attacks, where fraudsters are injected into the training data to manipulate learned patterns. Traditional defense methods predominantly depend on predefined assumptions or rules extracted from specific known attacks, limiting their generalizability to unknown attacks. To solve the above problems, considering the rich open-world knowledge encapsulated in Large Language Models (LLMs), we attempt to introduce LLMs into defense methods to broaden the knowledge beyond limited known attacks. We propose LoRec, an innovative framework that employs LLM-Enhanced Calibration to strengthen the robustness of sequential Recommender systems against poisoning attacks. LoRec integrates an LLM-enhanced CalibraTor (LCT) that refines the training process of sequential recommender systems with knowledge derived from LLMs, applying a user-wise reweighting to diminish the impact of attacks. Incorporating LLMs’ open-world knowledge, the LCT effectively converts the limited, specific priors or rules into a more general pattern of fraudsters, offering improved defenses against poisons. Our comprehensive experiments validate that LoRec, as a general framework, significantly strengthens the robustness of sequential recommender systems.
@inproceedings{zhang-2024-lorec, author = {Zhang, Kaike and Cao, Qi and Wu, Yunfan and Sun, Fei and Shen, Huawei and Cheng, Xueqi}, title = {LoRec: Combating Poisons with Large Language Model for Robust Sequential Recommendation}, year = {2024}, isbn = {9798400704314}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3626772.3657684}, doi = {10.1145/3626772.3657684}, booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval}, pages = {1733–1742}, numpages = {10}, keywords = {large language model, poisoning attack, robust sequential recommendation}, location = {Washington DC, USA}, series = {SIGIR '24} }
- Understanding and Improving Adversarial Collaborative Filtering for Robust RecommendationKaike Zhang , Qi Cao , Yunfan Wu , Fei Sun , Huawei Shen , and Xueqi ChengIn The Thirty-eighth Annual Conference on Neural Information Processing Systems, Aug 2024
@inproceedings{zhang2024understanding, title = {Understanding and Improving Adversarial Collaborative Filtering for Robust Recommendation}, author = {Zhang, Kaike and Cao, Qi and Wu, Yunfan and Sun, Fei and Shen, Huawei and Cheng, Xueqi}, booktitle = {The Thirty-eighth Annual Conference on Neural Information Processing Systems}, year = {2024}, url = {https://openreview.net/forum?id=k8AYft5ED1} }
2023
- Adversarial camouflage for node injection attack on graphsShuchang Tao , Qi Cao , Huawei Shen , Yunfan Wu , Liang Hou , Fei Sun , and Xueqi ChengInformation Sciences, Aug 2023
Node injection attacks on Graph Neural Networks (GNNs) have received increasing attention recently, due to their ability to degrade GNN performance with high attack success rates. However, our study indicates that these attacks often fail in practical scenarios, since defense/detection methods can easily identify and remove the injected nodes. To address this, we devote to camouflage node injection attack, making injected nodes appear normal and imperceptible to defense/detection methods. Unfortunately, the non-Euclidean structure of graph data and the lack of intuitive prior present great challenges to the formalization, implementation, and evaluation of camouflage. In this paper, we first propose and define camouflage as distribution similarity between ego networks of injected nodes and normal nodes. Then for implementation, we propose an adversarial CAmouflage framework for Node injection Attack, namely CANA, to improve attack performance under defense/detection methods in practical scenarios. A novel camouflage metric is further designed under the guide of distribution similarity. Extensive experiments demonstrate that CANA can significantly improve the attack performance under defense/detection methods with higher camouflage or imperceptibility. This work urges us to raise awareness of the security vulnerabilities of GNNs in practical applications.
@article{tao-2023-adv, title = {Adversarial camouflage for node injection attack on graphs}, journal = {Information Sciences}, volume = {649}, pages = {119611}, year = {2023}, issn = {0020-0255}, doi = {https://doi.org/10.1016/j.ins.2023.119611}, url = {https://www.sciencedirect.com/science/article/pii/S0020025523011969}, author = {Tao, Shuchang and Cao, Qi and Shen, Huawei and Wu, Yunfan and Hou, Liang and Sun, Fei and Cheng, Xueqi}, keywords = {Adversarial camouflage, Node injection attack, Adversarial attack, Graph neural networks} }
- Studying the Impact of Data Disclosure Mechanism in Recommender Systems via SimulationZiqian Chen , Fei Sun , Yifan Tang , Haokun Chen , Jinyang Gao , and Bolin DingACM Trans. Inf. Syst., Feb 2023
Recently, privacy issues in web services that rely on users’ personal data have raised great attention. Despite that recent regulations force companies to offer choices for each user to opt-in or opt-out of data disclosure, real-world applications usually only provide an “all or nothing” binary option for users to either disclose all their data or preserve all data with the cost of no personalized service. In this article, we argue that such a binary mechanism is not optimal for both consumers and platforms. To study how different privacy mechanisms affect users’ decisions on information disclosure and how users’ decisions affect the platform’s revenue, we propose a privacy-aware recommendation framework that gives users fine control over their data. In this new framework, users can proactively control which data to disclose based on the tradeoff between anticipated privacy risks and potential utilities. Then we study the impact of different data disclosure mechanisms via simulation with reinforcement learning due to the high cost of real-world experiments. The results show that the platform mechanisms with finer split granularity and more unrestrained disclosure strategy can bring better results for both consumers and platforms than the “all or nothing” mechanism adopted by most real-world applications.
@article{Chen-2023-privacy, author = {Chen, Ziqian and Sun, Fei and Tang, Yifan and Chen, Haokun and Gao, Jinyang and Ding, Bolin}, title = {Studying the Impact of Data Disclosure Mechanism in Recommender Systems via Simulation}, year = {2023}, issue_date = {July 2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {41}, number = {3}, issn = {1046-8188}, url = {https://doi.org/10.1145/3569452}, doi = {10.1145/3569452}, journal = {ACM Trans. Inf. Syst.}, month = feb, articleno = {60}, numpages = {26}, keywords = {GDPR, privacy, Recommender system} }
2022
- Debiasing Learning for Membership Inference Attacks Against Recommender SystemsZihan Wang , Na Huang , Fei Sun , Pengjie Ren , Zhumin Chen , Hengliang Luo , Maarten Rijke , and Zhaochun RenIn Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington DC, USA, Feb 2022
Learned recommender systems may inadvertently leak information about their training data, leading to privacy violations. We investigate privacy threats faced by recommender systems through the lens of membership inference. In such attacks, an adversary aims to infer whether a user’s data is used to train the target recommender. To achieve this, previous work has used a shadow recommender to derive training data for the attack model, and then predicts the membership by calculating difference vectors between users’ historical interactions and recommended items. State-of-the-art methods face two challenging problems: (i) training data for the attack model is biased due to the gap between shadow and target recommenders, and (ii) hidden states in recommenders are not observational, resulting in inaccurate estimations of difference vectors.To address the above limitations, we propose a Debiasing Learning for Membership Inference Attacks against recommender systems (DL-MIA) framework that has four main components: (i) a difference vector generator, (ii) a disentangled encoder, (iii) a weight estimator, and (iv) an attack model. To mitigate the gap between recommenders, a variational auto-encoder (VAE) based disentangled encoder is devised to identify recommender invariant and specific features. To reduce the estimation bias, we design a weight estimator, assigning a truth-level score for each difference vector to indicate estimation accuracy. We evaluate DL-MIA against both general recommenders and sequential recommenders on three real-world datasets. Experimental results show that DL-MIA effectively alleviates training and estimation biases simultaneously, and Íachieves state-of-the-art attack performance.
@inproceedings{wang-2022-mia, author = {Wang, Zihan and Huang, Na and Sun, Fei and Ren, Pengjie and Chen, Zhumin and Luo, Hengliang and de Rijke, Maarten and Ren, Zhaochun}, title = {Debiasing Learning for Membership Inference Attacks Against Recommender Systems}, year = {2022}, isbn = {9781450393850}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3534678.3539392}, doi = {10.1145/3534678.3539392}, booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, pages = {1959–1968}, numpages = {10}, keywords = {debiasing, membership inference attack, recommender system}, location = {Washington DC, USA}, series = {KDD '22} }
- Neural Re-ranking in Multi-stage Recommender Systems: A ReviewWeiwen Liu , Yunjia Xi , Jiarui Qin , Fei Sun , Bo Chen , Weinan Zhang , Rui Zhang , and Ruiming TangIn Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, Jul 2022Survey Track
@inproceedings{liu-2022-rank, title = {Neural Re-ranking in Multi-stage Recommender Systems: A Review}, author = {Liu, Weiwen and Xi, Yunjia and Qin, Jiarui and Sun, Fei and Chen, Bo and Zhang, Weinan and Zhang, Rui and Tang, Ruiming}, booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI-22}}, publisher = {International Joint Conferences on Artificial Intelligence Organization}, editor = {Raedt, Lud De}, pages = {5512--5520}, year = {2022}, month = jul, note = {Survey Track}, doi = {10.24963/ijcai.2022/771}, url = {https://doi.org/10.24963/ijcai.2022/771} }
- Graph Neural Networks in Recommender Systems: A SurveyShiwen Wu , Fei Sun , Wentao Zhang , Xu Xie , and Bin CuiACM Comput. Surv., Dec 2022
With the explosive growth of online information, recommender systems play a key role to alleviate such information overload. Due to the important application value of recommender systems, there have always been emerging works in this field. In recommender systems, the main challenge is to learn the effective user/item representations from their interactions and side information (if any). Recently, graph neural network (GNN) techniques have been widely utilized in recommender systems since most of the information in recommender systems essentially has graph structure and GNN has superiority in graph representation learning. This article aims to provide a comprehensive review of recent research efforts on GNN-based recommender systems. Specifically, we provide a taxonomy of GNN-based recommendation models according to the types of information used and recommendation tasks. Moreover, we systematically analyze the challenges of applying GNN on different types of data and discuss how existing works in this field address these challenges. Furthermore, we state new perspectives pertaining to the development of this field. We collect the representative papers along with their open-source implementations in .
@article{Wu-2022-GNN, author = {Wu, Shiwen and Sun, Fei and Zhang, Wentao and Xie, Xu and Cui, Bin}, title = {Graph Neural Networks in Recommender Systems: A Survey}, year = {2022}, issue_date = {May 2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {55}, number = {5}, issn = {0360-0300}, url = {https://doi.org/10.1145/3535101}, doi = {10.1145/3535101}, journal = {ACM Comput. Surv.}, month = dec, articleno = {97}, numpages = {37}, keywords = {survey, graph neural network, Recommender system} }
- Contrastive Learning for Sequential RecommendationXu Xie , Fei Sun , Zhaoyang Liu , Shiwen Wu , Jinyang Gao , Jiandong Zhang , Bolin Ding , and Bin CuiIn 2022 IEEE 38th International Conference on Data Engineering (ICDE), Dec 2022
@inproceedings{xie-2022-Contrastive, author = {Xie, Xu and Sun, Fei and Liu, Zhaoyang and Wu, Shiwen and Gao, Jinyang and Zhang, Jiandong and Ding, Bolin and Cui, Bin}, booktitle = {2022 IEEE 38th International Conference on Data Engineering (ICDE)}, title = {Contrastive Learning for Sequential Recommendation}, year = {2022}, volume = {}, number = {}, pages = {1259-1273}, keywords = {Computer vision;Conferences;Multitasking;Data engineering;Data models;Behavioral sciences;Data mining;Contrastive Learning;Deep Learning;Recom-mender Systems}, doi = {10.1109/ICDE53745.2022.00099} }
- Recommendation UnlearningChong Chen , Fei Sun , Min Zhang , and Bolin DingIn Proceedings of the ACM Web Conference 2022, Virtual Event, Lyon, France, Dec 2022
Recommender systems provide essential web services by learning users’ personal preferences from collected data. However, in many cases, systems also need to forget some training data. From the perspective of privacy, users desire a tool to erase the impacts of their sensitive data from the trained models. From the perspective of utility, if a system’s utility is damaged by some bad data, the system needs to forget such data to regain utility. While unlearning is very important, it has not been well-considered in existing recommender systems. Although there are some researches have studied the problem of machine unlearning, existing methods can not be directly applied to recommendation as they are unable to consider the collaborative information. In this paper, we propose RecEraser, a general and efficient machine unlearning framework tailored to recommendation tasks. The main idea of RecEraser is to divide the training set into multiple shards and train submodels with these shards. Specifically, to keep the collaborative information of the data, we first design three novel data partition algorithms to divide training data into balanced groups. We then further propose an adaptive aggregation method to improve the global model utility. Experimental results on three public benchmarks show that RecEraser can not only achieve efficient unlearning but also outperform the state-of-the-art unlearning methods in terms of model utility. The source code can be found at https://github.com/chenchongthu/Recommendation-Unlearning
@inproceedings{Chen-2022-unlearn, author = {Chen, Chong and Sun, Fei and Zhang, Min and Ding, Bolin}, title = {Recommendation Unlearning}, year = {2022}, isbn = {9781450390965}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3485447.3511997}, doi = {10.1145/3485447.3511997}, booktitle = {Proceedings of the ACM Web Conference 2022}, pages = {2768–2777}, numpages = {10}, keywords = {Collaborative Filtering;, Machine Unlearning, Recommender Systems, Selective Deletion}, location = {Virtual Event, Lyon, France}, series = {WWW '22} }
2021
- Factual Consistency Evaluation for Text Summarization via Counterfactual EstimationYuexiang Xie , Fei Sun , Yang Deng , Yaliang Li , and Bolin DingIn Findings of the Association for Computational Linguistics: EMNLP 2021, Nov 2021
Despite significant progress has been achieved in text summarization, factual inconsistency in generated summaries still severely limits its practical applications. Among the key factors to ensure factual consistency, a reliable automatic evaluation metric is the first and the most crucial one. However, existing metrics either neglect the intrinsic cause of the factual inconsistency or rely on auxiliary tasks, leading to an unsatisfied correlation with human judgments or increasing the inconvenience of usage in practice. In light of these challenges, we propose a novel metric to evaluate the factual consistency in text summarization via counterfactual estimation, which formulates the causal relationship among the source document, the generated summary, and the language prior. We remove the effect of language prior, which can cause factual inconsistency, from the total causal effect on the generated summary, and provides a simple yet effective way to evaluate consistency without relying on other auxiliary tasks. We conduct a series of experiments on three public abstractive text summarization datasets, and demonstrate the advantages of the proposed metric in both improving the correlation with human judgments and the convenience of usage. The source code is available at \urlhttps://github.com/xieyxclack/factual_coco.
@inproceedings{xie-etal-2021-factual-consistency, title = {Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation}, author = {Xie, Yuexiang and Sun, Fei and Deng, Yang and Li, Yaliang and Ding, Bolin}, editor = {Moens, Marie-Francine and Huang, Xuanjing and Specia, Lucia and Yih, Scott Wen-tau}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021}, month = nov, year = {2021}, address = {Punta Cana, Dominican Republic}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2021.findings-emnlp.10/}, doi = {10.18653/v1/2021.findings-emnlp.10}, pages = {100--110} }
- CausCF: Causal Collaborative Filtering for Recommendation Effect EstimationXu Xie , Zhaoyang Liu , Shiwen Wu , Fei Sun , Cihang Liu , Jiawei Chen , Jinyang Gao , Bin Cui, and 1 more authorIn Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, Queensland, Australia, Nov 2021
To improve user experience and profits of corporations, modern industrial recommender systems usually aim to select the items that are most likely to be interacted with (e.g., clicks and purchases). However, they overlook the fact that users may purchase the items even without recommendations. The real effective items are the ones that can contribute to purchase probability uplift. To select these effective items, it is essential to estimate the causal effect of recommendations. Nevertheless, it is difficult to obtain the real causal effect since we can only recommend or not recommend an item to a user at one time. Furthermore, previous works usually rely on the randomized controlled trial (RCT) experiment to evaluate their performance. However, it is usually not practicable in the recommendation scenario due to its expensive experimental cost. To tackle these problems, in this paper, we propose a causal collaborative filtering (CausCF) method inspired by the widely adopted collaborative filtering (CF) technique. It is based on the idea that similar users not only have a similar taste on items but also have similar treatment effects under recommendations. CausCF extends the classical matrix factorization to the tensor factorization with three dimensions—user, item, and treatment. Furthermore, we also employ regression discontinuity design (RDD) to evaluate the precision of the estimated causal effects from different models. With the testable assumptions, RDD analysis can provide an unbiased causal conclusion without RCT experiments. Through dedicated experiments on both offline and online experiments, we demonstrate the effectiveness of our proposed CausCF on the causal effect estimation and ranking performance improvement.
@inproceedings{Xie-2022-causcf, author = {Xie, Xu and Liu, Zhaoyang and Wu, Shiwen and Sun, Fei and Liu, Cihang and Chen, Jiawei and Gao, Jinyang and Cui, Bin and Ding, Bolin}, title = {CausCF: Causal Collaborative Filtering for Recommendation Effect Estimation}, year = {2021}, isbn = {9781450384469}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3459637.3481901}, doi = {10.1145/3459637.3481901}, booktitle = {Proceedings of the 30th ACM International Conference on Information \& Knowledge Management}, pages = {4253–4263}, numpages = {11}, keywords = {causal collaborative filtering, recommender system, regression discontinuity design}, location = {Virtual Event, Queensland, Australia}, series = {CIKM '21} }
- Variation Control and Evaluation for Generative Slate RecommendationsShuchang Liu , Fei Sun , Yingqiang Ge , Changhua Pei , and Yongfeng ZhangIn Proceedings of the Web Conference 2021, Ljubljana, Slovenia, Nov 2021
Slate recommendation generates a list of items as a whole instead of ranking each item individually, so as to better model the intra-list positional biases and item relations. In order to deal with the enormous combinatorial space of slates, recent work considers a generative solution so that a slate distribution can be directly modeled. However, we observe that such approaches—despite their proved effectiveness in computer vision—suffer from a trade-off dilemma in recommender systems: when focusing on reconstruction, they easily over-fit the data and hardly generate satisfactory recommendations; on the other hand, when focusing on satisfying the user interests, they get trapped in a few items and fail to cover the item variation in slates. In this paper, we propose to enhance the accuracy-based evaluation with slate variation metrics to estimate the stochastic behavior of generative models. We illustrate that instead of reaching to one of the two undesirable extreme cases in the dilemma, a valid generative solution resides in a narrow “elbow” region in between. And we show that item perturbation can enforce slate variation and mitigate the over-concentration of generated slates, which expand the “elbow” performance to an easy-to-find region. We further propose to separate a pivot selection phase from the generation process so that the model can apply perturbation before generation. Empirical results show that this simple modification can provide even better variance with the same level of accuracy compared to post-generation perturbation methods.
@inproceedings{Liu-2021-Slate, author = {Liu, Shuchang and Sun, Fei and Ge, Yingqiang and Pei, Changhua and Zhang, Yongfeng}, title = {Variation Control and Evaluation for Generative Slate Recommendations}, year = {2021}, isbn = {9781450383127}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3442381.3449864}, doi = {10.1145/3442381.3449864}, booktitle = {Proceedings of the Web Conference 2021}, pages = {436–448}, numpages = {13}, keywords = {Conditional Variational Auto-Encoder, Generative Recommendation, Slate Recommendation}, location = {Ljubljana, Slovenia}, series = {WWW '21} }
- Multi-interest Diversification for End-to-end Sequential RecommendationWanyu Chen , Pengjie Ren , Fei Cai , Fei Sun , and Maarten De RijkeACM Trans. Inf. Syst., Sep 2021
Sequential recommenders capture dynamic aspects of users’ interests by modeling sequential behavior. Previous studies on sequential recommendations mostly aim to identify users’ main recent interests to optimize the recommendation accuracy; they often neglect the fact that users display multiple interests over extended periods of time, which could be used to improve the diversity of lists of recommended items. Existing work related to diversified recommendation typically assumes that users’ preferences are static and depend on post-processing the candidate list of recommended items. However, those conditions are not suitable when applied to sequential recommendations. We tackle sequential recommendation as a list generation process and propose a unified approach to take accuracy as well as diversity into consideration, called multi-interest, diversified, sequential recommendation. Particularly, an implicit interest mining module is first used to mine users’ multiple interests, which are reflected in users’ sequential behavior. Then an interest-aware, diversity promoting decoder is designed to produce recommendations that cover those interests. For training, we introduce an interest-aware, diversity promoting loss function that can supervise the model to learn to recommend accurate as well as diversified items. We conduct comprehensive experiments on four public datasets and the results show that our proposal outperforms state-of-the-art methods regarding diversity while producing comparable or better accuracy for sequential recommendation.
@article{Chen-2021-multi, author = {Chen, Wanyu and Ren, Pengjie and Cai, Fei and Sun, Fei and De Rijke, Maarten}, title = {Multi-interest Diversification for End-to-end Sequential Recommendation}, year = {2021}, issue_date = {January 2022}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {40}, number = {1}, issn = {1046-8188}, url = {https://doi.org/10.1145/3475768}, doi = {10.1145/3475768}, journal = {ACM Trans. Inf. Syst.}, month = sep, articleno = {20}, numpages = {30}, keywords = {Sequential recommendation, diversified recommendation} }
- Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement LearningYang Deng , Yaliang Li , Fei Sun , Bolin Ding , and Wai LamIn Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, Sep 2021
Conversational recommender systems (CRS) enable the traditional recommender systems to explicitly acquire user preferences towards items and attributes through interactive conversations. Reinforcement learning (RL) is widely adopted to learn conversational recommendation policies to decide what attributes to ask, which items to recommend, and when to ask or recommend, at each conversation turn. However, existing methods mainly target at solving one or two of these three decision-making problems in CRS with separated conversation and recommendation components, which restrict the scalability and generality of CRS and fall short of preserving a stable training procedure. In the light of these challenges, we propose to formulate these three decision-making problems in CRS as a unified policy learning task. In order to systematically integrate conversation and recommendation components, we develop a dynamic weighted graph based RL method to learn a policy to select the action at each conversation turn, either asking an attribute or recommending items. Further, to deal with the sample efficiency issue, we propose two action selection strategies for reducing the candidate action space according to the preference and entropy information. Experimental results on two benchmark CRS datasets and a real-world E-Commerce application show that the proposed method not only significantly outperforms state-of-the-art methods but also enhances the scalability and stability of CRS.
@inproceedings{deng-2021-conv, author = {Deng, Yang and Li, Yaliang and Sun, Fei and Ding, Bolin and Lam, Wai}, title = {Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning}, year = {2021}, isbn = {9781450380379}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3404835.3462913}, doi = {10.1145/3404835.3462913}, booktitle = {Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval}, pages = {1431–1441}, numpages = {11}, keywords = {conversational recommendation, graph representation learning, reinforcement learning}, location = {Virtual Event, Canada}, series = {SIGIR '21} }
- Towards Long-term Fairness in RecommendationYingqiang Ge , Shuchang Liu , Ruoyuan Gao , Yikun Xian , Yunqi Li , Xiangyu Zhao , Changhua Pei , Fei Sun, and 3 more authorsIn Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual Event, Israel, Sep 2021
As Recommender Systems (RS) influence more and more people in their daily life, the issue of fairness in recommendation is becoming more and more important. Most of the prior approaches to fairness-aware recommendation have been situated in a static or one-shot setting, where the protected groups of items are fixed, and the model provides a one-time fairness solution based on fairness-constrained optimization. This fails to consider the dynamic nature of the recommender systems, where attributes such as item popularity may change over time due to the recommendation policy and user engagement. For example, products that were once popular may become no longer popular, and vice versa. As a result, the system that aims to maintain long-term fairness on the item exposure in different popularity groups must accommodate this change in a timely fashion.Novel to this work, we explore the problem of long-term fairness in recommendation and accomplish the problem through dynamic fairness learning. We focus on the fairness of exposure of items in different groups, while the division of the groups is based on item popularity, which dynamically changes over time in the recommendation process. We tackle this problem by proposing a fairness-constrained reinforcement learning algorithm for recommendation, which models the recommendation problem as a Constrained Markov Decision Process (CMDP), so that the model can dynamically adjust its recommendation policy to make sure the fairness requirement is always satisfied when the environment changes. Experiments on several real-world datasets verify our framework’s superiority in terms of recommendation performance, short-term fairness, and long-term fairness.
@inproceedings{ge-2021-fairness, author = {Ge, Yingqiang and Liu, Shuchang and Gao, Ruoyuan and Xian, Yikun and Li, Yunqi and Zhao, Xiangyu and Pei, Changhua and Sun, Fei and Ge, Junfeng and Ou, Wenwu and Zhang, Yongfeng}, title = {Towards Long-term Fairness in Recommendation}, year = {2021}, isbn = {9781450382977}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3437963.3441824}, doi = {10.1145/3437963.3441824}, booktitle = {Proceedings of the 14th ACM International Conference on Web Search and Data Mining}, pages = {445–453}, numpages = {9}, keywords = {unbiased recommendation, reinforcement learning, recommender system, long-term fairness, constrained policy optimization}, location = {Virtual Event, Israel}, series = {WSDM '21} }
- Explore User Neighborhood for Real-time E-commerce RecommendationXu Xie , Fei Sun , Xiaoyong Yang , Zhao Yang , Jinyang Gao , Wenwu Ou , and Bin CuiIn 2021 IEEE 37th International Conference on Data Engineering (ICDE), Sep 2021
@inproceedings{Xie-2021-user, author = {Xie, Xu and Sun, Fei and Yang, Xiaoyong and Yang, Zhao and Gao, Jinyang and Ou, Wenwu and Cui, Bin}, booktitle = {2021 IEEE 37th International Conference on Data Engineering (ICDE)}, title = {Explore User Neighborhood for Real-time E-commerce Recommendation}, year = {2021}, volume = {}, number = {}, pages = {2464-2475}, keywords = {Conferences;Computational modeling;Collaborative filtering;Data engineering;Real-time systems;Recommender systems;Candidate generation;User neighborhood;Real time}, doi = {10.1109/ICDE51399.2021.00279} }
2020
- Intent Preference Decoupling for User Representation on Online Recommender SystemZhaoyang Liu , Haokun Chen , Fei Sun , Xu Xie , Jinyang Gao , Bolin Ding , and Yanyan ShenIn Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Jul 2020Main track
@inproceedings{liu-2020-Intent, title = {Intent Preference Decoupling for User Representation on Online Recommender System}, author = {Liu, Zhaoyang and Chen, Haokun and Sun, Fei and Xie, Xu and Gao, Jinyang and Ding, Bolin and Shen, Yanyan}, booktitle = {Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, {IJCAI-20}}, publisher = {International Joint Conferences on Artificial Intelligence Organization}, editor = {Bessiere, Christian}, pages = {2575--2582}, year = {2020}, month = jul, note = {Main track}, doi = {10.24963/ijcai.2020/357}, url = {https://doi.org/10.24963/ijcai.2020/357} }
- Privileged Features Distillation at Taobao RecommendationsChen Xu , Quan Li , Junfeng Ge , Jinyang Gao , Xiaoyong Yang , Changhua Pei , Fei Sun , Jian Wu, and 2 more authorsIn Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, CA, USA, Jul 2020
Features play an important role in the prediction tasks of e-commerce recommendations. To guarantee the consistency of off-line training and on-line serving, we usually utilize the same features that are both available. However, the consistency in turn neglects some discriminative features. For example, when estimating the conversion rate (CVR), i.e., the probability that a user would purchase the item if she clicked it, features like dwell time on the item detailed page are informative. However, CVR prediction should be conducted for on-line ranking before the click happens. Thus we cannot get such post-event features during serving.We define the features that are discriminative but only available during training as the privileged features. Inspired by the distillation techniques which bridge the gap between training and inference, in this work, we propose privileged features distillation (PFD). We train two models, i.e., a student model that is the same as the original one and a teacher model that additionally utilizes the privileged features. Knowledge distilled from the more accurate teacher is transferred to the student, which helps to improve its prediction accuracy. During serving, only the student part is extracted and it relies on no privileged features. We conduct experiments on two fundamental prediction tasks at Taobao recommendations, i.e., click-through rate (CTR) at coarse-grained ranking and CVR at fine-grained ranking. By distilling the interacted features that are prohibited during serving for CTR and the post-event features for CVR, we achieve significant improvements over their strong baselines. During the on-line A/B tests, the click metric is improved by +5.0% in the CTR task. And the conversion metric is improved by +2.3% in the CVR task. Besides, by addressing several issues of training PFD, we obtain comparable training speed as the baselines without any distillation.
@inproceedings{Chen-2020-Distillation, author = {Xu, Chen and Li, Quan and Ge, Junfeng and Gao, Jinyang and Yang, Xiaoyong and Pei, Changhua and Sun, Fei and Wu, Jian and Sun, Hanxiao and Ou, Wenwu}, title = {Privileged Features Distillation at Taobao Recommendations}, year = {2020}, isbn = {9781450379984}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3394486.3403309}, doi = {10.1145/3394486.3403309}, booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining}, pages = {2590–2598}, numpages = {9}, keywords = {ctr, cvr, distillation, e-commerce recommendations, privileged features}, location = {Virtual Event, CA, USA}, series = {KDD '20} }
- Understanding Echo Chambers in E-commerce Recommender SystemsYingqiang Ge , Shuya Zhao , Honglu Zhou , Changhua Pei , Fei Sun , Wenwu Ou , and Yongfeng ZhangIn Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, Jul 2020
Personalized recommendation benefits users in accessing contents of interests effectively. Current research on recommender systems mostly focuses on matching users with proper items based on user interests. However, significant efforts are missing to understand how the recommendations influence user preferences and behaviors, e.g., if and how recommendations result in echo chambers. Extensive efforts have been made in examining the phenomenon in online media and social network systems. Meanwhile, there are growing concerns that recommender systems might lead to the self-reinforcing of user’s interests due to narrowed exposure of items, which may be the potential cause of echo chamber. In this paper, we aim to analyze the echo chamber phenomenon in Alibaba Taobao — one of the largest e-commerce platforms in the world.Echo chamber means the effect of user interests being reinforced through repeated exposure to similar contents. Based on the definition, we examine the presence of echo chamber in two steps. First, we explore whether user interests have been reinforced. Second, we check whether the reinforcement results from the exposure of similar contents. Our evaluations are enhanced with robust metrics, including cluster validity and statistical significance. Experiments are performed on extensive collections of real-world data consisting of user clicks, purchases, and browse logs from Alibaba Taobao. Evidence suggests the tendency of echo chamber in user click behaviors, while it is relatively mitigated in user purchase behaviors. Insights from the results guide the refinement of recommendation algorithms in real-world e-commerce systems.
@inproceedings{ge-2020-echo, author = {Ge, Yingqiang and Zhao, Shuya and Zhou, Honglu and Pei, Changhua and Sun, Fei and Ou, Wenwu and Zhang, Yongfeng}, title = {Understanding Echo Chambers in E-commerce Recommender Systems}, year = {2020}, isbn = {9781450380164}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3397271.3401431}, doi = {10.1145/3397271.3401431}, booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval}, pages = {2261–2270}, numpages = {10}, keywords = {e-commerce, echo chamber, filter bubble, recommender systems}, location = {Virtual Event, China}, series = {SIGIR '20} }
- Graph-based Regularization on Embedding Layers for RecommendationYuan Zhang , Fei Sun , Xiaoyong Yang , Chen Xu , Wenwu Ou , and Yan ZhangACM Trans. Inf. Syst., Sep 2020
Neural networks have been extensively used in recommender systems. Embedding layers are not only necessary but also crucial for neural models in recommendation as a typical discrete task. In this article, we argue that the widely used l2 regularization for normal neural layers (e.g., fully connected layers) is not ideal for embedding layers from the perspective of regularization theory in Reproducing Kernel Hilbert Space. More specifically, the l2 regularization corresponds to the inner product and the distance in the Euclidean space where correlations between discrete objects (e.g., items) are not well captured. Inspired by this observation, we propose a graph-based regularization approach to serve as a counterpart of the l2 regularization for embedding layers. The proposed regularization incurs almost no extra computational overhead especially when being trained with mini-batches. We also discuss its relationships to other approaches (namely, data augmentation, graph convolution, and joint learning) theoretically. We conducted extensive experiments on five publicly available datasets from various domains with two state-of-the-art recommendation models. Results show that given a kNN (k-nearest neighbor) graph constructed directly from training data without external information, the proposed approach significantly outperforms the l2 regularization on all the datasets and achieves more notable improvements for long-tail users and items.
@article{Zhang-2020-graph, author = {Zhang, Yuan and Sun, Fei and Yang, Xiaoyong and Xu, Chen and Ou, Wenwu and Zhang, Yan}, title = {Graph-based Regularization on Embedding Layers for Recommendation}, year = {2020}, issue_date = {January 2021}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {39}, number = {1}, issn = {1046-8188}, url = {https://doi.org/10.1145/3414067}, doi = {10.1145/3414067}, journal = {ACM Trans. Inf. Syst.}, month = sep, articleno = {2}, numpages = {27}, keywords = {neural recommender system, graph-based regularization, Embedding} }
2019
- A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendationXiao Lin , Hongjie Chen , Changhua Pei , Fei Sun , Xuanji Xiao , Hanxiao Sun , Yongfeng Zhang , Wenwu Ou, and 1 more authorIn Proceedings of the 13th ACM Conference on Recommender Systems, Copenhagen, Denmark, Sep 2019
Recommendation with multiple objectives is an important but difficult problem, where the coherent difficulty lies in the possible conflicts between objectives. In this case, multi-objective optimization is expected to be Pareto efficient, where no single objective can be further improved without hurting the others. However existing approaches to Pareto efficient multi-objective recommendation still lack good theoretical guarantees.In this paper, we propose a general framework for generating Pareto efficient recommendations. Assuming that there are formal differentiable formulations for the objectives, we coordinate these objectives with a weighted aggregation. Then we propose a condition ensuring Pareto efficiency theoretically and a two-step Pareto efficient optimization algorithm. Meanwhile the algorithm can be easily adapted for Pareto Frontier generation and fair recommendation selection. We specifically apply the proposed framework on E-Commerce recommendation to optimize GMV and CTR simultaneously. Extensive online and offline experiments are conducted on the real-world E-Commerce recommender system and the results validate the Pareto efficiency of the framework.To the best of our knowledge, this work is among the first to provide a Pareto efficient framework for multi-objective recommendation with theoretical guarantees. Moreover, the framework can be applied to any other objectives with differentiable formulations and any model with gradients, which shows its strong scalability.
@inproceedings{Lin-2019-pareto, author = {Lin, Xiao and Chen, Hongjie and Pei, Changhua and Sun, Fei and Xiao, Xuanji and Sun, Hanxiao and Zhang, Yongfeng and Ou, Wenwu and Jiang, Peng}, title = {A pareto-efficient algorithm for multiple objective optimization in e-commerce recommendation}, year = {2019}, isbn = {9781450362436}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3298689.3346998}, doi = {10.1145/3298689.3346998}, booktitle = {Proceedings of the 13th ACM Conference on Recommender Systems}, pages = {20–28}, numpages = {9}, keywords = {learning to rank, multiple obecjtive optimization, pareto efficiency, recommendation}, location = {Copenhagen, Denmark}, series = {RecSys '19} }
- BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from TransformerFei Sun , Jun Liu , Jian Wu , Changhua Pei , Xiao Lin , Wenwu Ou , and Peng JiangIn Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, Sep 2019
Modeling users’ dynamic preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks to encode users’ historical interactions from left to right into hidden representations for making recommendations. Despite their effectiveness, we argue that such left-to-right unidirectional models are sub-optimal due to the limitations including: begin enumerate* [label=seriesitshapealph*upshape)] item unidirectional architectures restrict the power of hidden representation in users’ behavior sequences; item they often assume a rigidly ordered sequence which is not always practical. end enumerate* To address these limitations, we proposed a sequential recommendation model called BERT4Rec, which employs the deep bidirectional self-attention to model user behavior sequences. To avoid the information leakage and efficiently train the bidirectional model, we adopt the Cloze objective to sequential recommendation, predicting the random masked items in the sequence by jointly conditioning on their left and right context. In this way, we learn a bidirectional representation model to make recommendations by allowing each item in user historical behaviors to fuse information from both left and right sides. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently.
@inproceedings{fei-2019-BERT4REC, author = {Sun, Fei and Liu, Jun and Wu, Jian and Pei, Changhua and Lin, Xiao and Ou, Wenwu and Jiang, Peng}, title = {BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer}, year = {2019}, isbn = {9781450369763}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3357384.3357895}, doi = {10.1145/3357384.3357895}, booktitle = {Proceedings of the 28th ACM International Conference on Information and Knowledge Management}, pages = {1441–1450}, numpages = {10}, keywords = {sequential recommendation, cloze, bidirectional sequential model}, location = {Beijing, China}, series = {CIKM '19} }
2018
- Multi-Source Pointer Network for Product Title SummarizationFei Sun , Peng Jiang , Hanxiao Sun , Changhua Pei , Wenwu Ou , and Xiaobo WangIn Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, Sep 2018
In this paper, we study the product title summarization problem in E-commerce applications for display on mobile devices. Comparing with conventional sentence summarization, product title summarization has some extra and essential constraints. For example, factual errors or loss of the key information are intolerable for E-commerce applications. Therefore, we abstract two more constraints for product title summarization: (i) do not introduce irrelevant information; (ii) retain the key information (e.g., brand name and commodity name). To address these issues, we propose a novel multi-source pointer network by adding a new knowledge encoder for pointer network. The first constraint is handled by pointer mechanism. For the second constraint, we restore the key information by copying words from the knowledge encoder with the help of the soft gating mechanism. For evaluation, we build a large collection of real-world product titles along with human-written short titles. Experimental results demonstrate that our model significantly outperforms the other baselines. Finally, online deployment of our proposed model has yielded a significant business impact, as measured by the click-through rate.
@inproceedings{sun-2018-point, author = {Sun, Fei and Jiang, Peng and Sun, Hanxiao and Pei, Changhua and Ou, Wenwu and Wang, Xiaobo}, title = {Multi-Source Pointer Network for Product Title Summarization}, year = {2018}, isbn = {9781450360142}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3269206.3271722}, doi = {10.1145/3269206.3271722}, booktitle = {Proceedings of the 27th ACM International Conference on Information and Knowledge Management}, pages = {7–16}, numpages = {10}, keywords = {extractive summarization, pointer network, title summarization}, location = {Torino, Italy}, series = {CIKM '18} }
2016
- Inside Out: Two Jointly Predictive Models for Word Representations and Phrase RepresentationsFei Sun , Jiafeng Guo , Yanyan Lan , Jun Xu , and Xueqi ChengProceedings of the AAAI Conference on Artificial Intelligence, Mar 2016
@article{sun-2016-aaai, title = {Inside Out: Two Jointly Predictive Models for Word Representations and Phrase Representations}, volume = {30}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/10338}, doi = {10.1609/aaai.v30i1.10338}, abstractnote = { <p> Distributional hypothesis lies in the root of most existing word representation models by inferring word meaning from its external contexts. However, distributional models cannot handle rare and morphologically complex words very well and fail to identify some fine-grained linguistic regularity as they are ignoring the word forms. On the contrary, morphology points out that words are built from some basic units, i.e., morphemes. Therefore, the meaning and function of such rare words can be inferred from the words sharing the same morphemes, and many syntactic relations can be directly identified based on the word forms. However, the limitation of morphology is that it cannot infer the relationship between two words that do not share any morphemes. Considering the advantages and limitations of both approaches, we propose two novel models to build better word representations by modeling both external contexts and internal morphemes in a jointly predictive way, called BEING and SEING. These two models can also be extended to learn phrase representations according to the distributed morphology theory. We evaluate the proposed models on similarity tasks and analogy tasks. The results demonstrate that the proposed models can outperform state-of-the-art models significantly on both word and phrase representation learning. </p> }, number = {1}, journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, author = {Sun, Fei and Guo, Jiafeng and Lan, Yanyan and Xu, Jun and Cheng, Xueqi}, year = {2016}, month = mar }
- Sparse word embeddings using l1 regularized online learningFei Sun , Jiafeng Guo , Yanyan Lan , Jun Xu , and Xueqi ChengIn Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, New York, USA, Mar 2016
Recently, Word2Vec tool has attracted a lot of interest for its promising performances in a variety of natural language processing (NLP) tasks. However, a critical issue is that the dense word representations learned in Word2Vec are lacking of interpretability. It is natural to ask if one could improve their interpretability while keeping their performances. Inspired by the success of sparse models in enhancing interpretability, we propose to introduce sparse constraint into Word2Vec. Specifically, we take the Continuous Bag of Words (CBOW) model as an example in our study and add the l l regularizer into its learning objective. One challenge of optimization lies in that stochastic gradient descent (SGD) cannot directly produce sparse solutions with l 1 regularizer in online training. To solve this problem, we employ the Regularized Dual Averaging (RDA) method, an online optimization algorithm for regularized stochastic learning. In this way, the learning process is very efficient and our model can scale up to very large corpus to derive sparse word representations. The proposed model is evaluated on both expressive power and interpretability. The results show that, compared with the original CBOW model, the proposed model can obtain state-of-the-art results with better interpretability using less than 10% non-zero elements.
@inproceedings{sun-2016-sparse, author = {Sun, Fei and Guo, Jiafeng and Lan, Yanyan and Xu, Jun and Cheng, Xueqi}, title = {Sparse word embeddings using l1 regularized online learning}, year = {2016}, isbn = {9781577357704}, publisher = {AAAI Press}, booktitle = {Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence}, pages = {2915–2921}, numpages = {7}, location = {New York, New York, USA}, series = {IJCAI'16} }
2015
- Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic RelationsFei Sun , Jiafeng Guo , Yanyan Lan , Jun Xu , and Xueqi ChengIn Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Jul 2015
@inproceedings{sun-2015-learning, title = {Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations}, author = {Sun, Fei and Guo, Jiafeng and Lan, Yanyan and Xu, Jun and Cheng, Xueqi}, editor = {Zong, Chengqing and Strube, Michael}, booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)}, month = jul, year = {2015}, address = {Beijing, China}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/P15-1014/}, doi = {10.3115/v1/P15-1014}, pages = {136--145} }
2011
- DOM based content extraction via text densityFei Sun , Dandan Song , and Lejian LiaoIn Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, Jul 2011
In addition to the main content, most web pages also contain navigation panels, advertisements and copyright and disclaimer notices. This additional content, which is also known as noise, is typically not related to the main subject and may hamper the performance of web data mining, and hence needs to be removed properly. In this paper, we present Content Extraction via Text Density (CETD) a fast, accurate and general method for extracting content from diverse web pages, and using DOM (Document Object Model) node text density to preserve the original structure. For this purpose, we introduce two concepts to measure the importance of nodes: Text Density and Composite Text Density. In order to extract content intact, we propose a technique called DensitySum to replace Data Smoothing. The approach was evaluated with the CleanEval benchmark and with randomly selected pages from well-known websites, where various web domains and styles are tested. The average F1-scores with our method were 8.79% higher than the best scores among several alternative methods.
@inproceedings{sun-2011-dom, author = {Sun, Fei and Song, Dandan and Liao, Lejian}, title = {DOM based content extraction via text density}, year = {2011}, isbn = {9781450307574}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/2009916.2009952}, doi = {10.1145/2009916.2009952}, booktitle = {Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval}, pages = {245–254}, numpages = {10}, keywords = {composite text density, content extraction, densitysum, text density}, location = {Beijing, China}, series = {SIGIR '11} }