Despite significant progress in model editing methods, their application in real-world scenarios remains challenging as they often cause large language models (LLMs) to collapse. Among them, ROME is particularly concerning, as it could disrupt LLMs with only a single edit. In this paper, we study the root causes of such collapse. Through extensive analysis, we identify two primary factors that contribute to the collapse: i) inconsistent handling of prefixed and unprefixed keys in the parameter update equation may result in very small denominators, causing excessively large parameter updates; ii) the subject of collapse cases is usually the first token, whose unprefixed key distribution significantly differs from the prefixed key distribution in autoregressive transformers, causing the aforementioned issue to materialize. To validate our findings, we propose a simple yet effective approach: uniformly using prefixed keys during editing phase and adding prefixes during testing phase to ensure the consistency between training and testing. The experimental results show that the proposed solution can prevent model collapse while maintaining the effectiveness of the edits.
While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources.To investigate this, we formulate a systematic framework to identify whether LLMs’ responses are attributed to either generated or retrieved contexts.To easily trace the origin of the response, we construct datasets with conflicting contexts, i.e., each question is paired with both generated and retrieved contexts, yet only one of them contains the correct answer.Our experiments reveal a significant bias in several LLMs (GPT-4/3.5 and Llama2) to favor generated contexts, even when they provide incorrect information.We further identify two key factors contributing to this bias: i) contexts generated by LLMs typically show greater similarity to the questions, increasing their likelihood of being selected; ii) the segmentation process used in retrieved contexts disrupts their completeness, thereby hindering their full utilization in LLMs.Our analysis enhances the understanding of how LLMs merge diverse contexts, offers valuable insights for advancing current LLM augmentation methods, and highlights the risk of generated misinformation for retrieval-augmented LLMs.
Although model editing has shown promise in revising knowledge in Large Language Models (LLMs), its impact on the inherent capabilities of LLMs is often overlooked. In this work, we reveal a critical phenomenon: even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. However, benchmarking LLMs after each edit, while necessary to prevent such collapses, is impractically time-consuming and resource-intensive. To mitigate this, we propose using perplexity as a surrogate metric, validated by extensive experiments demonstrating changes in an edited model‘s perplexity are strongly correlated with its downstream task performances. We further conduct an in-depth study on sequential editing, a practical setting for real-world scenarios, across various editing methods and LLMs, focusing on hard cases from our previous single edit studies. The results indicate that nearly all examined editing methods result in model collapse after only few edits. To facilitate further research, we have utilized GPT-3.5 to develop a new dataset, HardEdit, based on those hard cases. This dataset aims to establish the foundation for pioneering research in reliable model editing and the mechanisms underlying editing-induced model collapse. We hope this work can draw the community‘s attention to the potential risks inherent in model editing practices.
Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective guidance. To address this, we propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD), which leverages reinforcement learning guided by a tailored dual-component reward function. This function integrates quality reward and order-preserving alignment reward functions. Specifically, the order-preserving reward incentivizes the model to verbalize greater confidence for responses of higher quality to align the order of confidence and quality. Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy, without causing over-cautious. Furthermore, the aligned confidence provided by CONQORD informs when to trust LLMs, and acts as a determinant for initiating the retrieval process of external knowledge. Aligning confidence with response quality ensures more transparent and reliable responses, providing better trustworthiness.
As concerns over data privacy intensify, unlearning in Graph Neural Networks (GNNs) has emerged as a prominent research frontier in academia. This concept is pivotal in enforcing the right to be forgotten, which entails the selective removal of specific data from trained GNNs upon user request. Our research focuses on edge unlearning, a process of particular relevance to real-world applications. Current state-of-the-art approaches like GNNDelete can eliminate the influence of specific edges yet suffer from over-forgetting, which means the unlearning process inadvertently removes excessive information beyond needed, leading to a significant performance decline for remaining edges. Our analysis identifies the loss functions of GNNDelete as the primary source of over-forgetting and also suggests that loss functions may be redundant for effective edge unlearning. Building on these insights, we simplify GNNDelete to develop Unlink to Unlearn (UtU), a novel method that facilitates unlearning exclusively through unlinking the forget edges from graph structure. Our extensive experiments demonstrate that UtU delivers privacy protection on par with that of a retrained model while preserving high accuracy in downstream tasks, by upholding over 97.3% of the retrained model’s privacy protection capabilities and 99.8% of its link prediction accuracy. Meanwhile, UtU requires only constant computational demands, underscoring its advantage as a highly lightweight and practical edge unlearning solution.
WebConf (Short)
Accelerating the Surrogate Retraining for Poisoning Attacks against Recommender Systems
Yunfan Wu , Qi Cao , Shuchang Tao , Kaike Zhang , Fei Sun , and Huawei Shen
In Proceedings of the 18th ACM Conference on Recommender Systems, Bari, Italy, Aug 2024
Recent studies have demonstrated the vulnerability of recommender systems to data poisoning attacks, where adversaries inject carefully crafted fake user interactions into the training data of recommenders to promote target items. Current attack methods involve iteratively retraining a surrogate recommender on the poisoned data with the latest fake users to optimize the attack. However, this repetitive retraining is highly time-consuming, hindering the efficient assessment and optimization of fake users. To mitigate this computational bottleneck and develop a more effective attack in an affordable time, we analyze the retraining process and find that a change in the representation of one user/item will cause a cascading effect through the user-item interaction graph. Under theoretical guidance, we introduce Gradient Passing (GP), a novel technique that explicitly passes gradients between interacted user-item pairs during backpropagation, thereby approximating the cascading effect and accelerating retraining. With just a single update, GP can achieve effects comparable to multiple original training iterations. Under the same number of retraining epochs, GP enables a closer approximation of the surrogate recommender to the victim. This more accurate approximation provides better guidance for optimizing fake users, ultimately leading to enhanced data poisoning attacks. Extensive experiments on real-world datasets demonstrate the efficiency and effectiveness of our proposed GP.
Recommender systems play a pivotal role in mitigating information overload in various fields. Nonetheless, the inherent openness of these systems introduces vulnerabilities, allowing attackers to insert fake users into the system’s training data to skew the exposure of certain items, known as poisoning attacks. Adversarial training has emerged as a notable defense mechanism against such poisoning attacks within recommender systems. Existing adversarial training methods apply perturbations of the same magnitude across all users to enhance system robustness against attacks. Yet, in reality, we find that attacks often affect only a subset of users who are vulnerable. These perturbations of indiscriminate magnitude make it difficult to balance effective protection for vulnerable users without degrading recommendation quality for those who are not affected. To address this issue, our research delves into understanding user vulnerability. Considering that poisoning attacks pollute the training data, we note that the higher degree to which a recommender system fits users’ training data correlates with an increased likelihood of users incorporating attack information, indicating their vulnerability. Leveraging these insights, we introduce the Vulnerability-aware Adversarial Training (VAT), designed to defend against poisoning attacks in recommender systems. VAT employs a novel vulnerability-aware function to estimate users’ vulnerability based on the degree to which the system fits them. Guided by this estimation, VAT applies perturbations of adaptive magnitude to each user, not only reducing the success ratio of attacks but also preserving, and potentially enhancing, the quality of recommendations. Comprehensive experiments confirm VAT’s superior defensive capabilities across different recommendation models and against various types of attacks.
Sequential recommender systems stand out for their ability to capture users’ dynamic interests and the patterns of item transitions. However, the inherent openness of sequential recommender systems renders them vulnerable to poisoning attacks, where fraudsters are injected into the training data to manipulate learned patterns. Traditional defense methods predominantly depend on predefined assumptions or rules extracted from specific known attacks, limiting their generalizability to unknown attacks. To solve the above problems, considering the rich open-world knowledge encapsulated in Large Language Models (LLMs), we attempt to introduce LLMs into defense methods to broaden the knowledge beyond limited known attacks. We propose LoRec, an innovative framework that employs LLM-Enhanced Calibration to strengthen the robustness of sequential Recommender systems against poisoning attacks. LoRec integrates an LLM-enhanced CalibraTor (LCT) that refines the training process of sequential recommender systems with knowledge derived from LLMs, applying a user-wise reweighting to diminish the impact of attacks. Incorporating LLMs’ open-world knowledge, the LCT effectively converts the limited, specific priors or rules into a more general pattern of fraudsters, offering improved defenses against poisons. Our comprehensive experiments validate that LoRec, as a general framework, significantly strengthens the robustness of sequential recommender systems.
Node injection attacks on Graph Neural Networks (GNNs) have received increasing attention recently, due to their ability to degrade GNN performance with high attack success rates. However, our study indicates that these attacks often fail in practical scenarios, since defense/detection methods can easily identify and remove the injected nodes. To address this, we devote to camouflage node injection attack, making injected nodes appear normal and imperceptible to defense/detection methods. Unfortunately, the non-Euclidean structure of graph data and the lack of intuitive prior present great challenges to the formalization, implementation, and evaluation of camouflage. In this paper, we first propose and define camouflage as distribution similarity between ego networks of injected nodes and normal nodes. Then for implementation, we propose an adversarial CAmouflage framework for Node injection Attack, namely CANA, to improve attack performance under defense/detection methods in practical scenarios. A novel camouflage metric is further designed under the guide of distribution similarity. Extensive experiments demonstrate that CANA can significantly improve the attack performance under defense/detection methods with higher camouflage or imperceptibility. This work urges us to raise awareness of the security vulnerabilities of GNNs in practical applications.
Information Sciences
Studying the Impact of Data Disclosure Mechanism in Recommender Systems via Simulation
Ziqian Chen , Fei Sun , Yifan Tang , Haokun Chen , Jinyang Gao , and Bolin Ding
Recently, privacy issues in web services that rely on users’ personal data have raised great attention. Despite that recent regulations force companies to offer choices for each user to opt-in or opt-out of data disclosure, real-world applications usually only provide an “all or nothing” binary option for users to either disclose all their data or preserve all data with the cost of no personalized service. In this article, we argue that such a binary mechanism is not optimal for both consumers and platforms. To study how different privacy mechanisms affect users’ decisions on information disclosure and how users’ decisions affect the platform’s revenue, we propose a privacy-aware recommendation framework that gives users fine control over their data. In this new framework, users can proactively control which data to disclose based on the tradeoff between anticipated privacy risks and potential utilities. Then we study the impact of different data disclosure mechanisms via simulation with reinforcement learning due to the high cost of real-world experiments. The results show that the platform mechanisms with finer split granularity and more unrestrained disclosure strategy can bring better results for both consumers and platforms than the “all or nothing” mechanism adopted by most real-world applications.
Learned recommender systems may inadvertently leak information about their training data, leading to privacy violations. We investigate privacy threats faced by recommender systems through the lens of membership inference. In such attacks, an adversary aims to infer whether a user’s data is used to train the target recommender. To achieve this, previous work has used a shadow recommender to derive training data for the attack model, and then predicts the membership by calculating difference vectors between users’ historical interactions and recommended items. State-of-the-art methods face two challenging problems: (i) training data for the attack model is biased due to the gap between shadow and target recommenders, and (ii) hidden states in recommenders are not observational, resulting in inaccurate estimations of difference vectors.To address the above limitations, we propose a Debiasing Learning for Membership Inference Attacks against recommender systems (DL-MIA) framework that has four main components: (i) a difference vector generator, (ii) a disentangled encoder, (iii) a weight estimator, and (iv) an attack model. To mitigate the gap between recommenders, a variational auto-encoder (VAE) based disentangled encoder is devised to identify recommender invariant and specific features. To reduce the estimation bias, we design a weight estimator, assigning a truth-level score for each difference vector to indicate estimation accuracy. We evaluate DL-MIA against both general recommenders and sequential recommenders on three real-world datasets. Experimental results show that DL-MIA effectively alleviates training and estimation biases simultaneously, and Íachieves state-of-the-art attack performance.
With the explosive growth of online information, recommender systems play a key role to alleviate such information overload. Due to the important application value of recommender systems, there have always been emerging works in this field. In recommender systems, the main challenge is to learn the effective user/item representations from their interactions and side information (if any). Recently, graph neural network (GNN) techniques have been widely utilized in recommender systems since most of the information in recommender systems essentially has graph structure and GNN has superiority in graph representation learning. This article aims to provide a comprehensive review of recent research efforts on GNN-based recommender systems. Specifically, we provide a taxonomy of GNN-based recommendation models according to the types of information used and recommendation tasks. Moreover, we systematically analyze the challenges of applying GNN on different types of data and discuss how existing works in this field address these challenges. Furthermore, we state new perspectives pertaining to the development of this field. We collect the representative papers along with their open-source implementations in .
Recommender systems provide essential web services by learning users’ personal preferences from collected data. However, in many cases, systems also need to forget some training data. From the perspective of privacy, users desire a tool to erase the impacts of their sensitive data from the trained models. From the perspective of utility, if a system’s utility is damaged by some bad data, the system needs to forget such data to regain utility. While unlearning is very important, it has not been well-considered in existing recommender systems. Although there are some researches have studied the problem of machine unlearning, existing methods can not be directly applied to recommendation as they are unable to consider the collaborative information. In this paper, we propose RecEraser, a general and efficient machine unlearning framework tailored to recommendation tasks. The main idea of RecEraser is to divide the training set into multiple shards and train submodels with these shards. Specifically, to keep the collaborative information of the data, we first design three novel data partition algorithms to divide training data into balanced groups. We then further propose an adaptive aggregation method to improve the global model utility. Experimental results on three public benchmarks show that RecEraser can not only achieve efficient unlearning but also outperform the state-of-the-art unlearning methods in terms of model utility. The source code can be found at https://github.com/chenchongthu/Recommendation-Unlearning
WebConf
2021
Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation
Yuexiang Xie , Fei Sun , Yang Deng , Yaliang Li , and Bolin Ding
In Findings of the Association for Computational Linguistics: EMNLP 2021, Nov 2021
Despite significant progress has been achieved in text summarization, factual inconsistency in generated summaries still severely limits its practical applications. Among the key factors to ensure factual consistency, a reliable automatic evaluation metric is the first and the most crucial one. However, existing metrics either neglect the intrinsic cause of the factual inconsistency or rely on auxiliary tasks, leading to an unsatisfied correlation with human judgments or increasing the inconvenience of usage in practice. In light of these challenges, we propose a novel metric to evaluate the factual consistency in text summarization via counterfactual estimation, which formulates the causal relationship among the source document, the generated summary, and the language prior. We remove the effect of language prior, which can cause factual inconsistency, from the total causal effect on the generated summary, and provides a simple yet effective way to evaluate consistency without relying on other auxiliary tasks. We conduct a series of experiments on three public abstractive text summarization datasets, and demonstrate the advantages of the proposed metric in both improving the correlation with human judgments and the convenience of usage. The source code is available at \urlhttps://github.com/xieyxclack/factual_coco.
To improve user experience and profits of corporations, modern industrial recommender systems usually aim to select the items that are most likely to be interacted with (e.g., clicks and purchases). However, they overlook the fact that users may purchase the items even without recommendations. The real effective items are the ones that can contribute to purchase probability uplift. To select these effective items, it is essential to estimate the causal effect of recommendations. Nevertheless, it is difficult to obtain the real causal effect since we can only recommend or not recommend an item to a user at one time. Furthermore, previous works usually rely on the randomized controlled trial (RCT) experiment to evaluate their performance. However, it is usually not practicable in the recommendation scenario due to its expensive experimental cost. To tackle these problems, in this paper, we propose a causal collaborative filtering (CausCF) method inspired by the widely adopted collaborative filtering (CF) technique. It is based on the idea that similar users not only have a similar taste on items but also have similar treatment effects under recommendations. CausCF extends the classical matrix factorization to the tensor factorization with three dimensions—user, item, and treatment. Furthermore, we also employ regression discontinuity design (RDD) to evaluate the precision of the estimated causal effects from different models. With the testable assumptions, RDD analysis can provide an unbiased causal conclusion without RCT experiments. Through dedicated experiments on both offline and online experiments, we demonstrate the effectiveness of our proposed CausCF on the causal effect estimation and ranking performance improvement.
Slate recommendation generates a list of items as a whole instead of ranking each item individually, so as to better model the intra-list positional biases and item relations. In order to deal with the enormous combinatorial space of slates, recent work considers a generative solution so that a slate distribution can be directly modeled. However, we observe that such approaches—despite their proved effectiveness in computer vision—suffer from a trade-off dilemma in recommender systems: when focusing on reconstruction, they easily over-fit the data and hardly generate satisfactory recommendations; on the other hand, when focusing on satisfying the user interests, they get trapped in a few items and fail to cover the item variation in slates. In this paper, we propose to enhance the accuracy-based evaluation with slate variation metrics to estimate the stochastic behavior of generative models. We illustrate that instead of reaching to one of the two undesirable extreme cases in the dilemma, a valid generative solution resides in a narrow “elbow” region in between. And we show that item perturbation can enforce slate variation and mitigate the over-concentration of generated slates, which expand the “elbow” performance to an easy-to-find region. We further propose to separate a pivot selection phase from the generation process so that the model can apply perturbation before generation. Empirical results show that this simple modification can provide even better variance with the same level of accuracy compared to post-generation perturbation methods.
WebConf
Multi-interest Diversification for End-to-end Sequential Recommendation
Wanyu Chen , Pengjie Ren , Fei Cai , Fei Sun , and Maarten De Rijke
Sequential recommenders capture dynamic aspects of users’ interests by modeling sequential behavior. Previous studies on sequential recommendations mostly aim to identify users’ main recent interests to optimize the recommendation accuracy; they often neglect the fact that users display multiple interests over extended periods of time, which could be used to improve the diversity of lists of recommended items. Existing work related to diversified recommendation typically assumes that users’ preferences are static and depend on post-processing the candidate list of recommended items. However, those conditions are not suitable when applied to sequential recommendations. We tackle sequential recommendation as a list generation process and propose a unified approach to take accuracy as well as diversity into consideration, called multi-interest, diversified, sequential recommendation. Particularly, an implicit interest mining module is first used to mine users’ multiple interests, which are reflected in users’ sequential behavior. Then an interest-aware, diversity promoting decoder is designed to produce recommendations that cover those interests. For training, we introduce an interest-aware, diversity promoting loss function that can supervise the model to learn to recommend accurate as well as diversified items. We conduct comprehensive experiments on four public datasets and the results show that our proposal outperforms state-of-the-art methods regarding diversity while producing comparable or better accuracy for sequential recommendation.
Conversational recommender systems (CRS) enable the traditional recommender systems to explicitly acquire user preferences towards items and attributes through interactive conversations. Reinforcement learning (RL) is widely adopted to learn conversational recommendation policies to decide what attributes to ask, which items to recommend, and when to ask or recommend, at each conversation turn. However, existing methods mainly target at solving one or two of these three decision-making problems in CRS with separated conversation and recommendation components, which restrict the scalability and generality of CRS and fall short of preserving a stable training procedure. In the light of these challenges, we propose to formulate these three decision-making problems in CRS as a unified policy learning task. In order to systematically integrate conversation and recommendation components, we develop a dynamic weighted graph based RL method to learn a policy to select the action at each conversation turn, either asking an attribute or recommending items. Further, to deal with the sample efficiency issue, we propose two action selection strategies for reducing the candidate action space according to the preference and entropy information. Experimental results on two benchmark CRS datasets and a real-world E-Commerce application show that the proposed method not only significantly outperforms state-of-the-art methods but also enhances the scalability and stability of CRS.
As Recommender Systems (RS) influence more and more people in their daily life, the issue of fairness in recommendation is becoming more and more important. Most of the prior approaches to fairness-aware recommendation have been situated in a static or one-shot setting, where the protected groups of items are fixed, and the model provides a one-time fairness solution based on fairness-constrained optimization. This fails to consider the dynamic nature of the recommender systems, where attributes such as item popularity may change over time due to the recommendation policy and user engagement. For example, products that were once popular may become no longer popular, and vice versa. As a result, the system that aims to maintain long-term fairness on the item exposure in different popularity groups must accommodate this change in a timely fashion.Novel to this work, we explore the problem of long-term fairness in recommendation and accomplish the problem through dynamic fairness learning. We focus on the fairness of exposure of items in different groups, while the division of the groups is based on item popularity, which dynamically changes over time in the recommendation process. We tackle this problem by proposing a fairness-constrained reinforcement learning algorithm for recommendation, which models the recommendation problem as a Constrained Markov Decision Process (CMDP), so that the model can dynamically adjust its recommendation policy to make sure the fairness requirement is always satisfied when the environment changes. Experiments on several real-world datasets verify our framework’s superiority in terms of recommendation performance, short-term fairness, and long-term fairness.
Features play an important role in the prediction tasks of e-commerce recommendations. To guarantee the consistency of off-line training and on-line serving, we usually utilize the same features that are both available. However, the consistency in turn neglects some discriminative features. For example, when estimating the conversion rate (CVR), i.e., the probability that a user would purchase the item if she clicked it, features like dwell time on the item detailed page are informative. However, CVR prediction should be conducted for on-line ranking before the click happens. Thus we cannot get such post-event features during serving.We define the features that are discriminative but only available during training as the privileged features. Inspired by the distillation techniques which bridge the gap between training and inference, in this work, we propose privileged features distillation (PFD). We train two models, i.e., a student model that is the same as the original one and a teacher model that additionally utilizes the privileged features. Knowledge distilled from the more accurate teacher is transferred to the student, which helps to improve its prediction accuracy. During serving, only the student part is extracted and it relies on no privileged features. We conduct experiments on two fundamental prediction tasks at Taobao recommendations, i.e., click-through rate (CTR) at coarse-grained ranking and CVR at fine-grained ranking. By distilling the interacted features that are prohibited during serving for CTR and the post-event features for CVR, we achieve significant improvements over their strong baselines. During the on-line A/B tests, the click metric is improved by +5.0% in the CTR task. And the conversion metric is improved by +2.3% in the CVR task. Besides, by addressing several issues of training PFD, we obtain comparable training speed as the baselines without any distillation.
Personalized recommendation benefits users in accessing contents of interests effectively. Current research on recommender systems mostly focuses on matching users with proper items based on user interests. However, significant efforts are missing to understand how the recommendations influence user preferences and behaviors, e.g., if and how recommendations result in echo chambers. Extensive efforts have been made in examining the phenomenon in online media and social network systems. Meanwhile, there are growing concerns that recommender systems might lead to the self-reinforcing of user’s interests due to narrowed exposure of items, which may be the potential cause of echo chamber. In this paper, we aim to analyze the echo chamber phenomenon in Alibaba Taobao — one of the largest e-commerce platforms in the world.Echo chamber means the effect of user interests being reinforced through repeated exposure to similar contents. Based on the definition, we examine the presence of echo chamber in two steps. First, we explore whether user interests have been reinforced. Second, we check whether the reinforcement results from the exposure of similar contents. Our evaluations are enhanced with robust metrics, including cluster validity and statistical significance. Experiments are performed on extensive collections of real-world data consisting of user clicks, purchases, and browse logs from Alibaba Taobao. Evidence suggests the tendency of echo chamber in user click behaviors, while it is relatively mitigated in user purchase behaviors. Insights from the results guide the refinement of recommendation algorithms in real-world e-commerce systems.
Neural networks have been extensively used in recommender systems. Embedding layers are not only necessary but also crucial for neural models in recommendation as a typical discrete task. In this article, we argue that the widely used l2 regularization for normal neural layers (e.g., fully connected layers) is not ideal for embedding layers from the perspective of regularization theory in Reproducing Kernel Hilbert Space. More specifically, the l2 regularization corresponds to the inner product and the distance in the Euclidean space where correlations between discrete objects (e.g., items) are not well captured. Inspired by this observation, we propose a graph-based regularization approach to serve as a counterpart of the l2 regularization for embedding layers. The proposed regularization incurs almost no extra computational overhead especially when being trained with mini-batches. We also discuss its relationships to other approaches (namely, data augmentation, graph convolution, and joint learning) theoretically. We conducted extensive experiments on five publicly available datasets from various domains with two state-of-the-art recommendation models. Results show that given a kNN (k-nearest neighbor) graph constructed directly from training data without external information, the proposed approach significantly outperforms the l2 regularization on all the datasets and achieves more notable improvements for long-tail users and items.
Recommendation with multiple objectives is an important but difficult problem, where the coherent difficulty lies in the possible conflicts between objectives. In this case, multi-objective optimization is expected to be Pareto efficient, where no single objective can be further improved without hurting the others. However existing approaches to Pareto efficient multi-objective recommendation still lack good theoretical guarantees.In this paper, we propose a general framework for generating Pareto efficient recommendations. Assuming that there are formal differentiable formulations for the objectives, we coordinate these objectives with a weighted aggregation. Then we propose a condition ensuring Pareto efficiency theoretically and a two-step Pareto efficient optimization algorithm. Meanwhile the algorithm can be easily adapted for Pareto Frontier generation and fair recommendation selection. We specifically apply the proposed framework on E-Commerce recommendation to optimize GMV and CTR simultaneously. Extensive online and offline experiments are conducted on the real-world E-Commerce recommender system and the results validate the Pareto efficiency of the framework.To the best of our knowledge, this work is among the first to provide a Pareto efficient framework for multi-objective recommendation with theoretical guarantees. Moreover, the framework can be applied to any other objectives with differentiable formulations and any model with gradients, which shows its strong scalability.
Modeling users’ dynamic preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks to encode users’ historical interactions from left to right into hidden representations for making recommendations. Despite their effectiveness, we argue that such left-to-right unidirectional models are sub-optimal due to the limitations including: begin enumerate* [label=seriesitshapealph*upshape)] item unidirectional architectures restrict the power of hidden representation in users’ behavior sequences; item they often assume a rigidly ordered sequence which is not always practical. end enumerate* To address these limitations, we proposed a sequential recommendation model called BERT4Rec, which employs the deep bidirectional self-attention to model user behavior sequences. To avoid the information leakage and efficiently train the bidirectional model, we adopt the Cloze objective to sequential recommendation, predicting the random masked items in the sequence by jointly conditioning on their left and right context. In this way, we learn a bidirectional representation model to make recommendations by allowing each item in user historical behaviors to fuse information from both left and right sides. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently.
In this paper, we study the product title summarization problem in E-commerce applications for display on mobile devices. Comparing with conventional sentence summarization, product title summarization has some extra and essential constraints. For example, factual errors or loss of the key information are intolerable for E-commerce applications. Therefore, we abstract two more constraints for product title summarization: (i) do not introduce irrelevant information; (ii) retain the key information (e.g., brand name and commodity name). To address these issues, we propose a novel multi-source pointer network by adding a new knowledge encoder for pointer network. The first constraint is handled by pointer mechanism. For the second constraint, we restore the key information by copying words from the knowledge encoder with the help of the soft gating mechanism. For evaluation, we build a large collection of real-world product titles along with human-written short titles. Experimental results demonstrate that our model significantly outperforms the other baselines. Finally, online deployment of our proposed model has yielded a significant business impact, as measured by the click-through rate.
Recently, Word2Vec tool has attracted a lot of interest for its promising performances in a variety of natural language processing (NLP) tasks. However, a critical issue is that the dense word representations learned in Word2Vec are lacking of interpretability. It is natural to ask if one could improve their interpretability while keeping their performances. Inspired by the success of sparse models in enhancing interpretability, we propose to introduce sparse constraint into Word2Vec. Specifically, we take the Continuous Bag of Words (CBOW) model as an example in our study and add the l l regularizer into its learning objective. One challenge of optimization lies in that stochastic gradient descent (SGD) cannot directly produce sparse solutions with l 1 regularizer in online training. To solve this problem, we employ the Regularized Dual Averaging (RDA) method, an online optimization algorithm for regularized stochastic learning. In this way, the learning process is very efficient and our model can scale up to very large corpus to derive sparse word representations. The proposed model is evaluated on both expressive power and interpretability. The results show that, compared with the original CBOW model, the proposed model can obtain state-of-the-art results with better interpretability using less than 10% non-zero elements.
Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations
Fei Sun , Jiafeng Guo , Yanyan Lan , Jun Xu , and Xueqi Cheng
In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Jul 2015
In addition to the main content, most web pages also contain navigation panels, advertisements and copyright and disclaimer notices. This additional content, which is also known as noise, is typically not related to the main subject and may hamper the performance of web data mining, and hence needs to be removed properly. In this paper, we present Content Extraction via Text Density (CETD) a fast, accurate and general method for extracting content from diverse web pages, and using DOM (Document Object Model) node text density to preserve the original structure. For this purpose, we introduce two concepts to measure the importance of nodes: Text Density and Composite Text Density. In order to extract content intact, we propose a technique called DensitySum to replace Data Smoothing. The approach was evaluated with the CleanEval benchmark and with randomly selected pages from well-known websites, where various web domains and styles are tested. The average F1-scores with our method were 8.79% higher than the best scores among several alternative methods.