Personalized Denoising Implicit Feedback for Robust Recommender System

Kaike Zhang; Qi Cao; Yunfan Wu; Fei Sun; Huawei Shen; Xueqi Cheng

Personalized Denoising Implicit Feedback for Robust Recommender System

Kaike Zhang, Qi Cao, Yunfan Wu, Fei Sun, Huawei Shen, Xueqi Cheng

05 Oct 2024 (modified: 30 Oct 2024)ACM TheWebConf 2025 Conference SubmissionConference, Senior Area Chairs, Area Chairs, Reviewers, Authors

Track: User modeling, personalization and recommendation

Serve As Reviewer: Kaike Zhang

Keywords: Robust Recommender System, Denoising Recommendation, Implicit Feedback

Confirmation: I certify that my OpenReview profile is up to date (including accurate first and last names, a valid preferred email, and current and past affiliations) and that this paper adheres to the guidelines in the Call for Papers, including policy on human participants, authorship, and limits on maximum authorship.

TL;DR: We introduce PLD, a resampling strategy that enhances the robustness of recommender systems by denoising implicit feedback using users' personal loss distributions.

Abstract:

While implicit feedback is foundational to modern recommender systems, factors such as human error, uncertainty, and ambiguity in user behavior inevitably introduce significant noise into this feedback, adversely affecting the accuracy and robustness of recommendations. To address this issue, existing methods typically aim to reduce the training weight of noisy feedback or discard it entirely, based on the observation that noisy interactions often exhibit higher losses in the overall loss distribution. However, we identify two key issues: (1) there is a significant overlap between normal and noisy interactions in the overall loss distribution, and (2) this overlap becomes even more pronounced when transitioning from pointwise loss functions (e.g., BCE loss) to pairwise loss functions (e.g., BPR loss). This overlap leads traditional methods to misclassify noisy interactions as normal, and vice versa. To tackle these challenges, we further investigate the loss overlap and find that for a given user, there is a clear distinction between normal and noisy interactions in the user's personal loss distribution. Based on this insight, we propose a resampling strategy to Denoise using the user's Personal Loss distribution, named PLD, which aims to reduce the probability of noisy interactions being optimized. Specifically, during each optimization iteration, we create a candidate item pool for each user and resample the items from this pool based on the user's personal loss distribution, prioritizing normal interactions. Additionally, we conduct a theoretical analysis to validate PLD's effectiveness and suggest ways to further enhance its performance. Extensive experiments conducted on three datasets with varying noise ratios demonstrate PLD's efficacy and robustness.

Submission Number: 877

Paper Decision

Decisionby Program Chairs20 Jan 2025, 20:24 (modified: 21 Jan 2025, 02:35)Program Chairs, Authors

Decision: Accept (Poster)

Summary

Official Comment11 Dec 2024, 18:00Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

We sincerely thank all the reviewers for their valuable feedback and time spent evaluating our work. During the rebuttal period, we are pleased that our responses effectively addressed the reviewers' concerns, resulting in a 3-point increase in Novelty and a 6-point increase in Technical Quality.

Novel Approach and Perspective (reviewers #1, #2, #4, #5)
Rigorous Theoretical Analysis (reviewers #1, #2, #3, #5)
Comprehensive and Extensive Experiments (reviewers #1, #2, #3, #5)
Strong Performance (reviewers #1, #2, #5)
Convincing Data Analysis (reviewers #2, #5)
Clear and Reasonable Motivation (reviewers #4, #3)
Practical Utility (reviewer #1)
Sufficient Work Survey (reviewer #4)
Well-Structured and Written (reviewers #3, #4)

During the discussion period, our responses effectively addressed concerns such as:

Relaxing the theoretical assumptions and generalizing the Gaussian distribution to arbitrary distributions.
Providing further theoretical support for the overlap phenomenon mentioned in the paper.
Offering additional experiments to show that PLD not only improves the robustness of various complex models but also addresses different types of noise.

We also clarified misunderstandings, including:

The concept of Personalized Denoising, where personal loss distributions is used to perform distinct denoising processes tailored to each user.
The rationality of our experimental setup, which follows commonly establised protocols and compares with state-of-the-art models.
The validity of our observations, backed by evidence in our paper and supporting references.

We are very much looking forward to this paper being considered for presentation at the conference. It provides a fresh perspective on denoising methods and could significantly impact the community by sparking discussions on improving robust recommender systems.

Once again, we extend our heartfelt gratitude to all the reviewers for their constructive suggestions and time spent evaluating our work.

Best regards,
The Authors

Concerns about the Quality of Reviewer #1’s Reviews

Official Comment11 Dec 2024, 17:34Program Chairs, Senior Area Chairs, Area Chairs, Authors

Comment:

Dear Area Chair,

I hope this message finds you well. Reviewer #1’s reviews contain numerous flaws and substantial misunderstandings, leading to a biased evaluation of our work. Furthermore, the reviewer's dismissive response during the discussion phase contradicted their initial comments. We respectfully request the exclusion of Reviewer #1 from the evaluation of our submission.

Reviewer #1 overlooked many aspects discussed in our manuscript, leading to several misleading questions. Specifically,

Reviewer #1 neglected our explanation of personalized denoising, erroneously assumed our experimental scenarios, and even questioned the evaluation frameworks established by numerous works in the field.
Furthermore, Reviewer #1 ignored our theoretical explanations of the parameters (lines 502-522), mistakenly believing that we arbitrarily introduced parameters.
Reviewer #1 also overlooked our experimental results corresponding to the latest baselines (SIGKDD23 and SIGKDD24), incorrectly assuming that we did not select the most recent baselines.

Despite our detailed rebuttal addressing these concerns, the reviewer shifted their criticism to points that directly contradict the reviewer's initial review.

Initially, #1 praised the novelty of our approach and the innovation of leveraging personal loss distributions in the Pros section:

Reviewer #1: "Novel Approach: The introduction of personal loss distributions for denoising interactions is innovative and addresses the limitations of global loss distributions effectively."

However, they later arbitrarily claimed that our method lacks novelty in the most recent response:

Reviewer #1: "I still concern about the novelty of this paper, since there are already many denoising papers based on the loss scale."

This argument assumes that subsequent loss-based methods automatically lack novelty because some prior works employed loss-based approaches. This reasoning is baseless and ignores our work's novel perspective—leveraging personal loss distributions for personlized denoising—backed by rigorous theory and experiments, as recognized by other reviewers (#2, #4, #5).

Such unfounded critiques contradict standard conference review guidelines [1-3]. Moreover, equating innovation solely with model complexity and dismissing simple yet effective methods is an immature evaluation approach [4,5], undermining fairness and integrity.

Given these misunderstandings and #1’s inconsistent reasoning, we believe the scores given by Reviewer #1 are unwarranted. We respectfully request excluding #1’s feedback to ensure a fair and impartial review.

Thank you for considering our request.

Sincerely,
The Authors

References:
[1] ICLR Reviewer Guide 2025
[2] ACL 2023 Review Guidelines
[3] NeurIPS 2024 Reviewer Guidelines
[4] Log Conference Reviews
[5] Michael J. Black on Novelty in Science

Response to all reviewers

Official Comment08 Dec 2024, 11:07Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

We sincerely thank all reviewers for your valuable feedback and time in evaluating our work. We are encouraged that our main contributions were acknowledged by all reviewers. We have carefully addressed the concerns raised by providing detailed clarifications and additional support. We hope our efforts are evident and look forward to your continued attention to our paper.

To summarize the strengths recognized by reviewers:

Novel Approach and Perspective (reviewers #1, #2, #4, #5)
Rigorous Theoretical Analysis (reviewers #1, #2, #3, #5)
Comprehensive and Extensive Experiments (reviewers #1, #2, #3, #5)
Strong Performance (reviewers #1, #2, #5)
Convincing Data Analysis (reviewers #2, #5)
Clear and Reasonable Motivation (reviewers #4, #3)
Practical Utility (reviewer #1)
Sufficient Work Survey (reviewer #4)
Well-Structured and Written (reviewers #3, #4)

In the past two days, we had an in-depth discussion with reviewer #3. We are pleased that our novelty and technical quality scores have increased by 1 and 2 points, respectively. We believe our paper makes meaningful contributions to robust recommender systems and benefits the research community. We appreciate the constructive comments that helped improve our work. We kindly request reviewers to reassess our paper based on our responses. We look forward to your feedback.

Best regards,
Authors of paper 877

Official Review of Submission877 by Reviewer #1

Official Reviewby Reviewer #103 Dec 2024, 01:19 (modified: 10 Dec 2024, 22:57)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer #1, Authors

Review:

The paper introduces a Personal Loss Distribution-based Denoising (PLD) strategy, which shifts the denoising focus from the overall loss distribution to the personal loss distributions of individual users. The method:

Analyze users' personal loss distributions to distinguish noisy and normal interactions.
Implements a resampling strategy where candidate interactions are prioritized based on their likelihood of being normal, reducing the probability of optimizing noisy interactions.

Pros

Novel Approach: The introduction of personal loss distributions for denoising interactions is innovative and addresses the limitations of global loss distributions effectively.
Comprehensive Experiments: Extensive evaluations on diverse datasets (e.g., Gowalla, Yelp2018, MIND) demonstrate the robustness and applicability of the proposed method under varying noise conditions.
Theoretical Rigor: The method is supported by a strong theoretical foundation, with mathematical proofs and analysis validating its effectiveness.
Robustness to Noise: The method’s robustness across different noise ratios and its consistent performance highlight its practical utility in noisy environments.

Cons:

The contribution of this work feels somewhat incremental. The biggest innovation lies in the shift from using global loss distributions, as in previous methods, to employing personal loss distributions. While this approach refines the existing methods, it doesn’t seem to represent a significant leap forward. In addition, what is the overlap ratio in the training? The R-CE and T-CE methods drop noisy samples at the very beginning of the training phase.
Lack of Consideration for Noise Distribution Variations Across Users:

While the paper emphasizes personalized denoising, it fails to account for variations in noise levels across different users in its experimental design. All experiments assume a uniform noise ratio for all users, which contradicts the personalized premise of PLD.

This omission is particularly concerning given the results in Figure 5, where the performance of baseline methods like LightGCN is very close to PLD at various noise levels. The stability of the performance gap across noise levels suggests that a uniform noise ratio may disproportionately benefit PLD, undermining its real-world applicability.

Limited Innovation Beyond Training Optimization:

The primary innovation of PLD lies in modifying the training process through personalized resampling, without making fundamental changes to the model architectures or addressing core challenges in representation learning.

The method remains reliant on backbone models like MF and LightGCN. This approach blurs the distinction between PLD and existing methods like DCF and self-supervised learning techniques, especially since these also involve data augmentation or weighting strategies for denoising.

In Figure 5, as the noise level increases, the performance of PLD converges to that of DCF, particularly with the MIND dataset and LightGCN as the backbone, raising concerns about the novelty and effectiveness of the proposed resampling strategy.

Unjustified Addition of Temperature Coefficient $τ$ :

The paper introduces a temperature coefficient $τ$ to the resampling probability $P_{u, v}$ without sufficiently justifying its impact on the theoretical formulation of $ξ$ or its influence on the practical implementation of Theorem 1. While $τ$ is analyzed in the hyperparameter section, its inclusion lacks a clear mathematical or conceptual explanation in the methodology.

This omission weakens the theoretical foundation of PLD, leaving the reader questioning whether the performance improvements are due to better theoretical grounding or merely empirical tuning of hyperparameters.

Baseline Comparisons Lack Cutting-Edge Methods:

The baseline methods selected for comparison (e.g., R-CE, T-CE, DCF, and DeCA) are standard but do not represent the most advanced or recent denoising techniques in recommendation systems. Incorporating more sophisticated approaches, such as those leveraging graph-based denoising or advanced self-supervised learning paradigms, would provide a stronger benchmark for evaluating PLD's efficacy.

Questions:

Noise Distribution Variation Across Users:

How would the performance of PLD change if the noise level varied across users rather than being uniformly distributed?

Does the stability of PLD's performance gap with LightGCN across noise levels indicate that a uniform noise assumption is overly favorable to PLD?

Would introducing personalized noise levels better reflect the real-world applicability of PLD and align more closely with the personalized denoising concept?

Innovation Scope: Is PLD fundamentally different from data augmentation strategies like those in DCF or self-supervised learning methods?

Given that PLD relies on backbone models (MF, LightGCN), how does it address the limitations inherent to those models?

Does the convergence of PLD's performance with DCF at higher noise levels (e.g., on MIND with LightGCN) suggest that the resampling strategy adds minimal value beyond existing approaches?

Temperature Coefficient $τ$ Justification:

How does the introduction of $τ$ alter the mathematical formulation of $ξ$ in the resampling probability $P_{u, v}$ ?

Can the authors provide a theoretical justification for how (\tau) interacts with other hyperparameters to influence denoising performance?

Baseline Methods:

Why were cutting-edge denoising methods not included as baselines in the experiments?

How does PLD compare to the latest advancements in graph-based or self-supervised denoising techniques for recommendation systems?

Would the inclusion of these methods alter the conclusions about PLD’s efficacy?

Ethics Review Flag: No

Scope: 3: The work is somewhat relevant to the Web and to the track, and is of narrow interest to a sub-community

Novelty: 3

Technical Quality: 4

Reviewer Confidence: 4: The reviewer is certain that the evaluation is correct and very familiar with the relevant literature

Rebuttal by Authors

Rebuttal06 Dec 2024, 17:16 (modified: 06 Dec 2024, 17:21)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Rebuttal:

Thank you for your comments. We appreciate your recognition of our approach's novelty, practical utility, comprehensive experiments, and rigorous theoretical analysis. However, we regret the misunderstandings you have regarding our paper.

Due to space limitations, we first clarify several fundamental misunderstandings to help you better assess our contributions. More detailed explanations and supporting materials will follow in the point-by-point replies.

Personalized Denoising:
- "Personalized denoising" refers to using each user's personal loss distribution to perform distinct denoising processes for each user, rather than handling personalized noise ratios.
- We adopt a uniform noise ratio in our experiments to align with existing work, ensuring fair comparison.
- Conversely, utilizing personalized noise ratios would better benefit PLD, as described in lines 445-449 of the paper. We provide experimental results in subsequent reply A3.

Dependence on Backbone Models: PLD only adjusts the sampling probability of positive samples during training and does not rely on specific backbone models.
- PLD can be applied to classical recommendation models like MF, LightGCN, and NeuMF.
- It is also applicable to self-supervised models such as DCCF.
- Results are in subsequent reply A5 and A6.

Temperature Coefficient $τ$ : We introduced $τ$ based on rigorous theoretical analysis (lines 502-522 of the paper.).
- With $τ$ , only the mean and variance in Theorem 1 need adjustment to obtain the final expression, without affecting the conclusions of Theorem 1.
- Detailed explanations are in subsequent replies A7 and A11.

Baseline Selection: We indeed selected cutting-edge denoising methods, including BOD (SIGKDD23) and DCF (SIGKDD24).
- Regarding the self-supervised methods you mentioned, they do not conflict with PLD and can be combined as the backbone model. PLD can further enhance their performance. Results are in subsequent reply A6.

We will provide detailed explanations and supporting data in subsequent replies. We sincerely thank you for your time and effort in reviewing our paper. We hope our clarifications help you evaluate our work and kindly request you reconsider your overall assessment. We believe our research offers valuable insights and advances the state-of-the-art in robust recommender systems, contributing to the research community.

Detailed Explanations for Q1-Q3 (1/4)

Official Comment06 Dec 2024, 17:23 (modified: 06 Dec 2024, 17:59)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q1: Employing personal loss distributions doesn’t seem to represent a significant leap forward.

A1: Transitioning from global loss distributions to personal loss distributions is highly meaningful.

Most existing denoising methods are based on global loss distributions, which present numerous issues, such as significant overlap, thereby limiting denoising performance.
This problem is particularly severe in BPR-based recommender systems, causing many existing methods to fail (as mentioned in lines 108-127 of the paper).

For example, in Tables 4 and 6, T-CE is effective at denoising in BCE-based recommender systems but significantly hampers recommendation performance in BPR-based systems. Our method is not constrained by the loss function, and we believe this will contribute significantly to the field.

Q2: What is the R-CE and T-CE's overlap ratio in the training?

A2: We provide the overlap ratios during the training process in the table below.

We found that R-CE and T-CE even lead to increased overlap ratios in global loss distribution on the MIND dataset within LightGCN (consistent with Table 1 in the paper).
- This is because the initial overlap causes misclassifications in R-CE and T-CE.
- Leading to some normal interactions not being involved in training while some noise interactions are, thereby enlarging the overlap region.
PLD, through personal loss distribution, effectively distinguishes between normal and noisy interactions.
- Thus, PLD decreases overlap ratios, reflecting better denoising performance.

Epoch - Global Loss Distribution	20	40	60	80	100
Standard Training	18.40	21.32	23.19	25.55	24.32
R-CE	18.79	24.85	27.88	30.62	33.43
T-CE	20.31	26.02	30.18	31.92	35.61
PLD	14.11	15.06	15.33	15.96	16.61

Additionally, we provide the overlap ratios of PLD in the personal loss distribution. We observe that the overlap ratio in the personal loss distribution is significantly lower than that in the global loss distribution, further confirming that considering personal loss distributions is highly meaningful.

Epoch - Personal Loss Distribution	20	40	60	80	100
Standard Training	5.13	5.40	5.29	6.38	6.03
PLD	5.00	5.39	5.32	5.99	6.11

Q3: While the paper emphasizes personalized denoising, it fails to account for variations in noise levels across different users in its experimental design. All experiments assume a uniform noise ratio for all users, which contradicts the personalized premise of PLD.

A3: We would like to emphasize that:

"Personalized denoising" refers to our use of each user's personal loss distribution to perform distinct denoising processes for each user, rather than handling personalized noise ratios. This does not conflict with the premise of PLD.
Additionally, our current experimental setup follows the common settings of previous denoising work [1][2], where noise is proportionally added.
Therefore, we follow the established settings of previous methods to ensure a fair comparison.

Furthermore, we would like to clarify that:

In scenarios where different users have varying noise ratios, PLD has a greater advantage over existing methods.
This is because PLD does not compromise the performance of users with lower noise ratios while still providing sufficient denoising effects (as described in lines 445-449).

To address your concerns, we provide experimental results in scenarios where different users have varying noise ratios (randomly introducing a certain number of noises):

Gowalla	Recall@20
MF	0.1107
+ R-CE	0.1182
+ T-Ce	0.1024
+ DeCA	0.1125
+ BOD	0.1159
+ DCF	0.1110
+ PLD	0.1367
Gain	15.65%

We can observe that, under such conditions, PLD achieves a greater improvement compared to proportionally adding noise.

[1] Learning to Denoise Unreliable Interactions for Graph Collaborative Filtering. SIGIR 2022.
[2] Efficient Bi-Level Optimization for Recommendation Denoising. SIGKDD 2023.

Detailed Explanations for Q4-Q8 (2/4)

Official Comment06 Dec 2024, 17:23Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q4: The performance of baseline methods like DCF is very close to PLD at various noise levels on LightGCN.

A4: Our method, PLD, significantly outperforms the baseline methods.

To address your concerns, we provide the p-values from T-tests (comparing with the performance of the runner-up) below, demonstrating that PLD significantly outperforms the baseline methods. When the p-value is less than 0.1, PLD has a SIGNIFICANT IMPROVEMENT!

Noise Ratio	0.0	0.1	0.2	0.3	0.4
Gowalla	<1e-7	<1e-7	<1e-7	<1e-5	0.012
Yelp	<1e-6	<1e-5	<1e-5	<1e-5	<1e-5
MIND	<1e-7	<1e-5	<1e-4	0.005	0.09

We can see that all the above results passed the t-test, indicating that PLD significantly outperforms existing methods.

Q5: The method remains reliant on backbone models like MF and LightGCN.

A5: Our method does not depend on specific backbone models. PLD merely needs to adjust the sampling probability of positive samples during the model training process, exhibiting strong generalization and scalability.

We selected MF and LightGCN merely because these two methods represent two categories of classical models.
To address your concern, we have additionally provided experimental results with NeuMF.

Gowalla	Recall@20
NeuMF	0.1203
+ R-CE	0.1295
+ T-Ce	0.0979
+ DeCA	0.1251
+ BOD	0.1267
+ DCF	0.1294
+ PLD	0.1345
Gain	3.94%

We can see that PLD also achieves improvements on different types of backbone models.

Q6: PLD blurs the distinction between PLD and existing methods like DCF and self-supervised learning techniques.

A6: PLD is distinctly different from existing methods.

PLD reduces the probability of noise interactions being optimized through resampling.
Whereas DCF denoises by adjusting weights.
And self-supervised learning techniques denoise by optimizing different data augmentation views.

We would like to state that PLD is not an exclusive method: it only adjusts the sampling probability of benign samples and is not limited to specific architectures or scenarios.

Therefore, PLD can be combined with existing self-supervised models to achieve better results, as shown in the table below.

Gowalla	Recall@20	Recall@50	NDCG@20	NDCG@50
DCCF (SIGIR23)	0.1649	0.2217	0.1118	0.1329
+ PLD	0.1710	0.2392	0.1151	0.1428
Gain	3.70%	7.90%	2.95%	7.45%

Q7: Unjustified Addition of Temperature Coefficient $τ$ .

A7: We did not arbitrarily introduce the temperature coefficient $τ$ . We would like to emphasize that:

We introduced the temperature coefficient $τ$ through theoretical analysis (lines 502-522 of the paper), specifically:
- After introducing $τ$ , we can increase $E [Λ_{normal} - Λ_{noise}]$ to achieve better performance.
Additionally, the introduction of $τ$ does not affect the conclusion of Theorem 1; it only requires substituting $μ_{1}^{'} = μ_{1} / τ$ , $μ_{2}^{'} = μ_{2} / τ$ , and $σ^{' 2} = σ^{2} / τ^{2}$ into Equation 2.

Q8: Baseline Comparisons Lack Cutting-Edge Methods.

A8: The methods we compared against (e.g., R-CE, T-CE, DeCA, BOD - SIGKDD23, DCF - SIGKDD24) include the latest state-of-the-art methods up to 2024.

These comparative methods are directly relevant to PLD.
Regarding the "self-supervised learning techniques and graph-based denoising" you mentioned, such methods do not conflict with the aforementioned denoising methods and can serve as backbone models in combination with these methods to achieve better results, as demonstrated in the table in A6 with DCCF+PLD.

Detailed Explanations for Q9-Q10 (3/4)

Official Comment06 Dec 2024, 17:25 (modified: 06 Dec 2024, 18:01)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q9: Noise Distribution Variation Across Users.

A9: We would like to emphasize that our experimental setup follows the common settings of existing work [1][2]. In response to your questions, we provide the following answers:

Q9-1: How would the performance of PLD change if the noise level varied across users rather than being uniformly distributed?

A9-1: When noise levels vary across users, PLD still achieves consistently good performance and even shows more significant improvements, as demonstrated in the table in A3.
- This is because PLD does not compromise the performance of users with lower noise ratios while still providing sufficient denoising effects (as described in lines 445-449 of the paper).
- Therefore, following the established settings of uniform noise ratio in previous works can ensure a fair comparison.

Q9-2: Does the stability of PLD's performance gap with LightGCN across noise levels indicate that a uniform noise assumption is overly favorable to PLD?

A9-2: Not at all. In fact, scenarios with varying noise levels across users are more favorable to PLD (as described in lines 445-449 of the paper).
- As shown in the table in A3, PLD performs well when noise levels vary among users. The results indicate that this setup is more advantageous for PLD rather than the uniform noise assumption.

Q9-3: Would introducing personalized noise levels better reflect the real-world applicability of PLD and align more closely with the personal denoising concept?

A9-3:

Firstly, we agree that introducing different noise levels for different users may be more consistent with real-world scenarios.
- However, we set the same noise level in our experiments to provide a fairer comparison, even though this scenario is more favorable to previous methods.
- We will add the results for different noise levels to the revised paper to provide a more comprehensive comparison.
Secondly, we would like to reiterate that "personalized denoising" refers to our use of each user's personal loss distribution to perform distinct denoising processes for each user, rather than handling personalized noise ratios.

Q10: Is PLD fundamentally different from data augmentation strategies like those in DCF or self-supervised learning methods?

A10: PLD is distinctly different from existing methods.

PLD reduces the probability of noise interactions being optimized through resampling.
Whereas DCF denoises by adjusting weights.
Self-supervised learning techniques denoise by optimizing different data augmentation views.
Additionally, PLD can be integrated with existing self-supervised learning models to provide better performance, as demonstrated in A6.

Q10-1: Given that PLD relies on backbone models (MF, LightGCN), how does it address the limitations inherent to those models?

A10-1: PLD does not depend on specific backbone models.
- It is a resampling approach that improves the training process of the model without relying on specific model architectures.
- PLD can be added to traditional recommendation models (MF, LightGCN, NeuMF as shown in A5) and to self-supervised models (DCCF as shown in A6).

Q10-2: Does the convergence of PLD's performance with DCF at higher noise levels suggest that the resampling strategy adds minimal value beyond existing approaches?

A10-2:

Firstly, we would like to clarify that we adopt 40% and 50% noise levels, which are already far beyond realistic noise levels, solely to verify robustness in extreme scenarios. Such high noise ratios are generally unlikely to occur in real-world applications.
- We set such high noise ratios solely to demonstrate the robustness of different methods against noise.
- When the noise ratio is excessively high, it severely degrades backbone model performance, resulting in smaller differences between methods.
Secondly, in addition to PLD's superior performance improvements, we also demonstrate in Figure 9 and Appendix A.2 that PLD has lower time and space complexity. Overall, PLD provides substantial value.

[1] Learning to Denoise Unreliable Interactions for Graph Collaborative Filtering. SIGIR 2022.
[2] Efficient Bi-Level Optimization for Recommendation Denoising. SIGKDD 2023.

Detailed Explanations for Q11-Q12 (4/4)

Official Comment06 Dec 2024, 17:25Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q11: Temperature Coefficient $τ$ :

Q11-1: How does the introduction of $τ$ alter the mathematical formulation of $ξ$ in the resampling probability?

A11-1: After introducing $τ$ , we can express $ξ$ as $ξ = \exp ((μ_{1} - μ_{2}) / τ)$ , as shown in line 529 of the paper.

By decreasing $τ$ , we can reduce $ξ$ , thereby actively increasing $\frac{n α - m β}{(m + n) η}$ to enhance the sampling probability of normal interactions, as shown in lines 530-531 of the paper.

Q11-2: Can the authors provide a theoretical justification for how $τ$ interacts with other hyperparameters to influence denoising performance?

A11-2: Firstly, The introduction of $τ$ does not affect the conclusions of Theorem 1.
- We only need to substitute $μ_{1}^{'} = μ_{1} / τ$ , $μ_{2}^{'} = μ_{2} / τ$ , and $σ^{' 2} = σ^{2} / τ^{2}$ into $α$ , $β$ , and $γ$ in Theorem 1 without modifying other expressions.
- Additionally, after introducing $τ$ , we still have $Γ > \frac{χ}{C^{2}} \frac{k^{2}}{(k - 1)^{2}}$ , so it does not affect the subsequent analysis with $k$ .
Secondly, in conjunction with the response in A11-1, by decreasing $τ$ , we can actively increase $\frac{n α - m β}{(m + n) η}$ to enhance the sampling probability of normal interactions, while reducing the sampling probability of noisy interactions, as shown in lines 529-531 of the paper.

Q12: Baseline Methods:

A12: The methods we compared against (e.g., R-CE, T-CE, DeCA, BOD - SIGKDD23, DCF - SIGKDD24) include the latest state-of-the-art methods up to 2024.

These comparative methods are directly relevant to PLD.
Regarding the "self-supervised learning techniques and graph-based denoising" you mentioned, such methods do not conflict with the aforementioned denoising methods and can be combined to achieve better results, as shown in A8.

Thank you for your detailed comments and suggestions, as well as your appreciation of the importance of this work. We hope that our clarifications address your concerns. Based on our responses, we kindly hope that you can reconsider the overall assessment of this paper. We believe that our work offers valuable insights and advances the state-of-the-art in robust recommender systems, making a meaningful contribution to the research community.

Replying to Detailed Explanations for Q11-Q12 (4/4)

Follow up

Official Comment08 Dec 2024, 21:33Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for taking the time and effort to review our paper. We have carefully considered your comments and responded to each point with detailed clarifications and supporting evidence. As the deadline for the discussion period between authors and reviewers is approaching, we kindly ask whether our responses have addressed your concerns. We look forward to your feedback.

Thank you once again for your time and effort.

Sincerely,
The Authors

Replying to Follow up

Kindly Reminder

Official Comment09 Dec 2024, 20:02Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

We sincerely thank you for your time and effort in reviewing our paper and supporting our work. With the discussion period deadline (Dec. 10th) approaching, we are following up on our earlier response.

We understand your schedule may be busy, and we appreciate the time you've already invested in reviewing our paper. We have carefully considered your comments and provided detailed clarifications and additional support for each concern in our response.

We would be grateful for your feedback on whether our response has addressed your concerns. If your concerns have been addressed, we kindly request that you consider re-evaluating our paper and increasing our scores. Your continued support is immensely valuable, and we truly appreciate your feedback.

Once again, thank you for your insights and ongoing support.

Sincerely,
The Authors

Replying to Kindly Reminder

Thanks for the rebuttal

Official Commentby Reviewer #110 Dec 2024, 22:57Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thanks for the rebuttal, which addresses many technical concerns. However, I still concern about the novelty of this paper, since there are already many denoising papers based on the loss scale. I admit that the finding that using the user-based loss is important and theoretical guarantees exist, Thus, I will increase the rating of Technical Quality to 4, and maintain the Novelty rating as 3.

Replying to Thanks for the rebuttal

Official Comment by Authors

Official Comment11 Dec 2024, 00:37Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for your response and for acknowledging the theoretical contributions of our work.

However, we would like to emphasize an important point. While some prior works have employed loss-based approaches for denoising, these methods suffer from inherent limitations, such as significant overlap issues and challenges in applying them to BPR loss-based recommender systems.

In contrast, we propose a novel perspective by leveraging personal loss distribution, which we have theoretically demonstrated to be superior. Moreover, our experimental results clearly show that our method consistently outperforms traditional approaches across various scenarios.

Thus, it is unreasonable to dismiss the novelty of subsequent loss-based denoising methods solely on the grounds that "prior works have explored loss-based approaches."

It is also worth noting that Michael J. Black emphasized in Novelty in Science: “Taking an existing network and replacing one thing is better science than concocting a whole new network just to make it look more complex” [https://perceiving-systems.blog/en/news/novelty-in-science]. This principle has also been incorporated into the review guidelines of many conferences, such as [https://logconference.org/reviews/#best-practices]. We believe that our simple yet effective approach embodies this principle and offers genuine novelty.

We belive that our work provides a innovative perspective and a novel approach to denoising, as also recognized by other reviewers (reviewers #2, #4, #5). Furthermore, in your previous review, you also acknowledged the novelty of our method and recognized the innovation of applying personal loss distribution.

In light of this, we sincerely hope you will reconsider your evaluation and assign a higher score to our work.

Sincerely,
The Authors

Official Review of Submission877 by Reviewer #2

Official Reviewby Reviewer #202 Dec 2024, 22:04 (modified: 03 Dec 2024, 21:23)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer #2, Authors

Review:

This paper addresses noise in implicit feedback for recommender systems, highlighting limitations in loss-based denoising due to overlap between normal and noisy interactions. The authors propose PLD (Personal Loss Distribution), a resampling strategy leveraging user-specific loss distributions to better distinguish interactions.

Pros:

This paper convincingly demonstrates, through data analysis, the limitations of traditional methods that address collaborative filtering noise by globally assigning loss weights.
The introduction of the PLD method, which resamples based on personal loss distributions, is a novel and potentially effective solution to the problem. The theoretical analysis supporting the method is also a strength.
Extensive experiments show PLD improves accuracy and robustness over existing methods.

Cons:

The paper lacks a case study with real user interactions to showcase the denoising effects before and after applying the method.
The theoretical conclusions presented in the paper could be more concise.

Questions:

Pls see the comments above.

Ethics Review Flag: No

Ethics Review Description: N/A

Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community

Novelty: 4

Technical Quality: 5

Reviewer Confidence: 3: The reviewer is confident but not certain that the evaluation is correct

Rebuttal by Authors

Rebuttal06 Dec 2024, 17:27 (modified: 06 Dec 2024, 18:03)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Rebuttal:

Thank you for your positive comments. We greatly appreciate your recognition of our work's convincingly robust data analysis, novel methodology, comprehensive theoretical analysis, and extensive experiments. Below, we address each of your questions:

Q1: The paper lacks a case study with real user interactions to showcase the denoising effects before and after applying the method.

A1: We have included user cases in Section 5.3 of the main text.

Figure 6 illustrates the sampling probabilities of noisy interactions before and after denoising for six users.
Figure 7 presents statistical data on sample numbers for PLD.
These results show that PLD significantly decreases the probability of noisy interactions being optimized.

To further address your concern, we provide the table below, showing how often normal interactions (Items 1-8) and noisy interactions (Items 9-10) are optimized w/ and w/o PLD for a user during 100 epochs. We also include results from the R-CE method, normalizing its weights to reflect the numbers.

Our findings indicate that PLD significantly reduces noisy interactions' impact on model training. We will include these results in the revised paper to demonstrate our method's denoising effectiveness.

Item No.	1	2	3	4	5	6	7	8	9 (Noise)	10 (Noise)
Standard Training	9	9	13	12	11	8	6	7	12	13
R-CE	13	11	11	7	15	12	8	9	6	8
PLD	15	12	17	8	16	8	11	9	1	3

Q2: The theoretical conclusions presented in the paper could be more concise.

A2: Thank you for your suggestion. We will streamline the theoretical conclusions in the main text and move detailed derivations to the appendix to enhance readability. Specifically:

Describe $α$ and $β$ as the expectations of normal and noisy interaction losses and move their specific expressions to the appendix.
Integrate $γ$ and $η$ into $Γ$ and $χ$ and simplify the expression for $E [Λ]$ .

Thank you for your detailed comments and useful suggestions, as well as your appreciation of the importance of this work. We hope that our clarifications address your concerns. We look forward to your reconsideration and reevaluation of the overall assessment of our work.

Follow up

Official Comment08 Dec 2024, 21:33Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for taking the time and effort to review our paper. We have carefully considered your comments and responded to each point with detailed clarifications and supporting evidence. As the deadline for the discussion period between authors and reviewers is approaching, we kindly ask whether our responses have addressed your concerns. We look forward to your feedback.

Thank you once again for your time and effort.

Sincerely,
The Authors

Replying to Follow up

Kindly Reminder

Official Comment09 Dec 2024, 20:02Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

We sincerely thank you for your time and effort in reviewing our paper and supporting our work. With the discussion period deadline (Dec. 10th) approaching, we are following up on our earlier response.

We understand your schedule may be busy, and we appreciate the time you've already invested in reviewing our paper. We have carefully considered your comments and provided detailed clarifications and additional support for each concern in our response.

We would be grateful for your feedback on whether our response has addressed your concerns. If your concerns have been addressed, we kindly request that you consider re-evaluating our paper and increasing our scores. Your continued support is immensely valuable, and we truly appreciate your feedback.

Once again, thank you for your insights and ongoing support.

Sincerely,
The Authors

Replying to Kindly Reminder

Official Comment by Reviewer #2

Official Commentby Reviewer #211 Dec 2024, 11:05Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thanks to the author for the response, which addresses most of my concerns. Since my scores are relatively high, I tend to maintain the score. I’ll discuss further with the other reviewers and the AC during the discussion stage.

Replying to Official Comment by Reviewer #2

Thank you for your reply

Official Comment11 Dec 2024, 11:30Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for reviewing our response. We are pleased to hear that our response has addressed your concerns. Please don’t hesitate to reach out if you have any further questions.

Sincerely,
Authors

Official Review of Submission877 by Reviewer #3

Official Reviewby Reviewer #302 Dec 2024, 16:26 (modified: 08 Dec 2024, 04:00)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer #3, Authors

Review:

Summary

This paper discusses identifying noisy samples based on training loss to reduce the impact of noisy samples during training. The authors observe that there is significant overlap between the loss of noisy and clean samples, making it difficult to distinguish between them based solely on the magnitude of the loss. Based on a synthetic experiment, they find that, from a user-level perspective, the loss of the two types of samples exhibits a greater difference. Based on this observation, the authors propose a user-level denoising method, providing theoretical analysis and experimental validation to support their approach.

Pros

The author provides some theoretical analysis of the proposed method.
The author conducts a comprehensive set of experiments on the proposed method and theory.
This paper is well-written and easy to follow.

Cons

Insufficient Motivation: The paper lacks an in-depth explanation of why the person-level loss has less overlap than the overall loss. Section 4.1 only discusses this phenomenon in a synthetic dataset (where the noise is independently and randomly generated), but there is a lack of further theoretical or intuitive explanation. Moreover, it remains unknown whether the noise in the real-world scenario follows such a characteristic.
Strong Assumption: Theorem 1 assumes that the loss follows a Gaussian distribution, but there is no further explanation. Generally speaking, this assumption holds only when the loss is based on MSE, while the losses chosen in this paper are BPR and BCE. I understand that this assumption makes the calculations easier, but such a strong assumption undermines the generalizability of the theoretical conclusions.

After the rebuttal

The authors claim that the Theorem 1 can disgard the strong Gaussian assumption. Additionally, the authors provide an intuitive explanation and interesting theoretical analysis to support the motivation. As most of my concerns are addressed, I decided to increase my scores from (4,4) to (5,6).

Questions:

See weaknesses. Additionally, I would like to know:

Is it possible to provide a theorem for proving the person-level loss distribution has less overlap than the overall loss?
Is it possible to remove the Gaussian assumption in Theorem 1? For example, only assume that the loss has a mean of $μ$ and the variance $σ^{2}$ , instead of assuming it follows $N (μ, σ^{2})$ ?

Ethics Review Flag: No

Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community

Novelty: 5

Technical Quality: 6

Reviewer Confidence: 3: The reviewer is confident but not certain that the evaluation is correct

Rebuttal by Authors

Rebuttal06 Dec 2024, 17:28Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Rebuttal:

Thank you for your comments. We greatly appreciate your recognition of our work's theoretical analysis, extensive experiments, and clear presentation. Below, we address each of your questions with clarifications and further support.

Due to space limitations, we first summarize our responses and clarify some misunderstandings. More detailed explanations and supporting materials will be provided in the subsequent point-by-point replies.

Why person-level loss has less overlap than overall loss:
- We have indeed conducted a detailed analysis in lines 367-378 of the manuscript, specifically:
  - The data includes hard-to-train (high-loss) and easy-to-train (low-loss) users, as shown in Figure 3 of the paper.
  - The normal interaction loss of high-loss users may exceed the noisy interaction loss of low-loss users.
  - Thus, in the global loss distribution, more normal interactions have higher losses than noisy interactions.
  - This leads to greater overlap in the global loss distribution.
- To further address your concern, we have provided additional experimental evidence in the subsequent reply A1.
- Additionally, we have included the corresponding theoretical framework in reply A3.

Assumption of Gaussian distribution:
- Firstly, the personal loss distribution can be approximated by a Gaussian, as shown in Figure 3 of the main text.
  - Additionally, We conduct the Shapiro-Wilk test on the personal loss distributions.
  - We verified that the most users' personal loss distributions conform to Gaussian. Detailed results are in the subsequent reply A2.
- Secondly, the Gaussian assumption in Theorem 1 primarily enhances clarity of expressions.
  - This assumption does not affect the conclusion of the theorem.
  - We will explain in subsequent reply A2 how Theorem 1 can be adjusted for any distribution.

We will provide more detailed explanations and supporting data for the above points in the subsequent replies. We sincerely thank you for your time and effort in reviewing our paper, as well as your valuable suggestions. We hope our clarifications assist in evaluating our work and kindly request you reconsider your overall assessment of this paper. We believe our research offers valuable insights and advances the state-of-the-art in robust recommender systems, contributing meaningfully to the research community.

Detailed Explanations for Q1-Q2 (1/2)

Official Comment06 Dec 2024, 17:36Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q1: The paper lacks an in-depth explanation of why the person-level loss has less overlap than the overall loss.

A1: As mentioned in our previous reply, we have conducted a detailed analysis in lines 367-378 of the manuscript, specifically:

The data contains some hard-to-train users (high-loss users) and some easy-to-train users (low-loss users), as shown in Figure 3 of the paper.
This causes the loss of normal interactions for high-loss users to potentially exceed the loss of noisy interactions for low-loss users.
Consequently, in the Global Loss Distribution, a higher number of normal interactions have higher losses than noisy interactions.
In contrast, personal loss distributions mitigate the issue of varying losses among users, thereby reducing the overlap.

To further substantiate this conclusion, we introduced varying levels of noise for different users, which amplifies the differences in losses among users. Under these conditions, in the Gowalla dataset:

The proportion of Normal Interaction within Overlap in the Global Loss Distribution increased from 15.33% to 26.07%.
The proportion of Normal Interaction within Overlap in the Personal Loss Distribution remained nearly unchanged (4.11% and 4.3%).

These results confirm the stronger advantage of Personal Loss Distributions.

Additionally, we present the corresponding performance in the table below: We observed that under these conditions, PLD achieved greater improvements, further demonstrating the advantages of personalized loss distributions.

Gowalla	Recall@20
MF	0.1107
+ R-CE	0.1182
+ T-CE	0.1024
+ DeCA	0.1125
+ BOD	0.1159
+ DCF	0.1110
+ PLD	0.1367
Gain	15.65%

Q2: Theorem 1 assumes that the loss follows a Gaussian distribution, but there is no further explanation. Is it possible to remove the Gaussian assumption in Theorem 1?

A2: We would like to clarify that we verified that personal loss distribution can be approximated by a Gaussian distribution by the Shapiro-Wilk test; furthermore, we can remove the Gaussian assumption in Theorem 1.

Firstly, the personal loss distribution can be approximated by a Gaussian distribution, as illustrated in Figure 3 of the main text.

To address your concern, we conducted the Shapiro-Wilk test on the normal interactions of the personal loss distributions (a higher p-value indicates better conformity to a Gaussian distribution).
- At a significance level of 0.05, 65.89% of users' data accepted the normal distribution assumption.
- At a more stringent significance level of 0.005, 78.98% of users' data did so.

Secondly, the assumption of a Gaussian distribution in Theorem 1 primarily facilitates the derivation of specific probability expressions and does not affect the conclusions.

Specifically, as detailed in the proof in the appendix (lines 1198-1199), the derived expressions do not rely on any particular distributional assumptions:
- $E [Λ_{normal}] = E [S_{x}] \cdot E [\frac{1}{S_{x} + S_{y}}] - \frac{k n}{n + m} \cdot \frac{Var [\exp (- x_{i})]}{(k - 1)^{2} C^{2}},$
- where $E [\frac{1}{S_{x} + S_{y}}] \approx \frac{1}{E [S_{x}] + E [S_{y}]} (1 + \frac{Var [S_{x}] + Var [S_{y}]}{{(E [S_{x}] + E [S_{y}])}^{2}}) .$
- Above expressions do not rely on any particular distributional assumptions.
Similarly, in Equation (2), different distributions can be accommodated by substituting the corresponding means and variances into the parameters $α$ , $β$ , and $γ$ , without altering the conclusion of Theorem 1.

Due to space constraints, we will discuss this point in more detail in the revised manuscript.

Detailed Explanations for Q3 (2/2)

Official Comment06 Dec 2024, 17:39 (modified: 06 Dec 2024, 17:40)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q3: Is it possible to provide a theorem for proving the person-level loss distribution has less overlap than the overall loss?

A3: Certainly. To maintain brevity during the rebuttal phase, we provide a concise statement of the theorem below:

Assumptions:

The means of all users' losses follow a distribution with mean $μ_{o}$ and variance $σ^{2}$ .
For a given user, the mean loss is $μ_{u}$ . We assume:
- The loss of normal interactions follows a distribution with mean $μ_{u}^{+} = μ_{u} + ϵ^{+}$ and variance $σ_{u}^{+}$ .
- The loss of noisy interactions follows a distribution with mean $μ_{u}^{-} = μ_{u} - ϵ^{-}$ and variance $σ_{u}^{-}$ .
- Here, $ϵ^{+}, ϵ^{-} \sim N (0, 1)$ serves as an adjustment term.

Conclusion: When $σ^{2} > σ_{u}^{+}, σ_{u}^{-}$ for $u \in U$ (which is obviously satisfied), by calculating the sum of the expectation of overlap for each user's personalized loss distribution and comparing it to the expectation of the overlap in the global loss distribution, we demonstrate that the overlap in personalized loss distributions is smaller than that in the global loss distribution.

This theorem underscores that personalized loss distributions effectively reduce overlap, thereby enhancing the discriminative power between normal and noisy interactions.

Thank you for your detailed comments and useful suggestions, as well as your appreciation of the importance of this work. We hope that our clarifications address your concerns. We look forward to your reconsideration and reevaluation of the overall assessment of our work.

Replying to Detailed Explanations for Q3 (2/2)

Official Comment by Reviewer #3

Official Commentby Reviewer #306 Dec 2024, 18:54Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thanks for your quick response.

To Gaussian Assumption

I believe you may have misunderstood the results of the Shapiro-Wilk test. As far as I know, a high p-value only indicates that we cannot reject the Gaussian hypothesis (e.g., the sample count may be too small to draw a conclusion, which is likely the case). It does not mean that we can accept the Gaussian hypothesis. Your analysis demonstrates that even with a very small sample count (shown in your Figure 3), as high as 34.11% of user losses can reject the Gaussian hypothesis, which actually disproves your assumption.
However, I appreciate your claim that Theorem 1 can apply to various distributions. I suggest removing the assumption that the loss follows a Gaussian distribution from the paper and revising Theorem 1 accordingly.

To Overlap Assumption

I appreciate your intuitive explanation of the overlap phenomenon. I believe emphasizing these key explanations (e.g., the presence of high-loss users and low-loss users) in the Introduction and Motivation sections could significantly enhance the persuasiveness of the paper.
Although I appreciate the new theorem you introduced, I have concerns regarding your assumption about the adjustment term. You assume that the variance of epsilon is 1, which is independent of the scale of $μ_{u}$ and counterintuitive. For example, in an extreme case, if $μ_{u}$ ranges on the order of 10^2, the impact of adding epsilon would be negligible; however, if $μ_{u}$ is around 0.01, adding epsilon would significantly affect the results. Moreover, why do these mathematical assumptions fail to reflect the statement "presence of high-loss users and low-loss users"? This seems to contradict your intuitive explanation.

Replying to Official Comment by Reviewer #3

Official Comment by Authors

Official Comment06 Dec 2024, 20:09Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for your insightful feedback. Indeed, a subset of users reject the assumption of a Gaussian distribution. Due to the extreme sparsity of interactions for each user, we are unable to perform further analysis under the Gaussian distribution framework. Consequently, we will remove the assumption that the loss follows a Gaussian distribution from the paper and revise Theorem 1 accordingly. We greatly appreciate your suggestion, as it enhances the theoretical rigor of our work.

Regarding the new theorem provided in A3, we assumed $ϵ \sim N (0, 1)$ for the sake of simplifying the derivation. We apologize for not explicitly stating the condition $μ_{o} ≫ 1$ that underpins our analysis:

Under the condition $μ_{o} ≫ 1$ , the new theorem holds true.
Should this condition need to be relaxed, we could consider:
- Setting $ϵ \sim N (0, 1 \times 10^{- 4} μ_{o}^{2})$ .
- Or, defining $μ_{u}^{+} = μ_{o} (1 + ϵ^{+})$ and similarly adjusting $μ_{u}^{-}$ .

Furthermore, the distinction between "high-loss users" and "low-loss users" is valid. For example:

Low-loss users: A subset of users have a loss mean $μ_{u 1} < μ_{o} - 2 σ$ (with 2 serving as an illustrative value).
High-loss users: Another subset of users have a loss mean $μ_{u 2} > μ_{o} + 2 σ$ .

In such scenarios, the loss from noisy interactions of low-loss users exceeds that of normal interactions from high-loss users. Specifically, when $σ > σ_{u}^{+}$ and $σ > σ_{u}^{-}$ (the conditions we provide in A3), this phenomenon is further ensured:

That is, for low-loss and high-loss users, the expectation of loss from noisy interactions of low-loss users is guaranteed to be higher than the expectation of loss from normal interactions of high-loss users.

Once again, thank you for your valuable feedback. We hope that the clarifications provided address your concerns. We kindly request you to reconsider your overall assessment of our paper. We believe our research offers significant insights and advances the state-of-the-art in robust recommender systems, thereby making a meaningful contribution to the research community.

Replying to Official Comment by Authors

Further concerns about your new assumption

Official Commentby Reviewer #306 Dec 2024, 21:29Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for your clarification. However, assuming $μ_{o} ≫ 1$ is undoubtedly not realistic, since most of the loss values in your Figure 1 are less than 1. Also, Setting $ϵ \sim N (0, 10^{- 4} μ_{o}^{2})$ or defining $μ_{u} = μ_{o} (1 + ϵ^{+})$ also seems to too strong and unnatured.

What about using $ϵ \sim N (0, k)$ where k is some constant? Under this condition, can you prove your theorem when $k$ is sufficiently large?

Replying to Further concerns about your new assumption

Official Comment by Authors

Official Comment07 Dec 2024, 15:27Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for your response and engaging discussion. Firstly, I would like to correct the relaxed condition I provided in my previous reply: it should be $μ_{u}^{+} = μ_{u} (1 + ϵ^{+})$ instead of $μ_{o} (1 + ϵ^{+})$ . This condition implies that $μ_{u}^{+}$ and $μ_{u}^{-}$ are obtained by fine-tuning $μ_{u}$ .

Next, we will explore the implications of the theorem under the assumption that $ϵ \sim N (0, k)$ . We aim to address this issue from two perspectives:

The conclusion of the theorem when $k$ is sufficiently large.
The magnitude of $k$ in practical scenarios.

1. When $k$ is Sufficiently Large

Firstly, when $k$ is excessively large, i.e., $k ≫ μ_{o}^{2}$ , we are unable to rigorously prove that the aforementioned theorem holds. Through approximation, we can assert that the theorem holds with a certain probability; however, the specific probability would require further assumptions about the distribution outlined in A3.

Alternatively, we could introduce a new condition, $μ_{u}^{+} < μ_{u}^{-}$ (a condition derived from our observations). Specifically:

$μ_{u}^{+} = μ_{u} - | ϵ^{+} |$ , $μ_{u}^{-} = μ_{u} + | ϵ^{+} |$

Under this condition:

Regardless of the value of $k$ , we can ensure that the global overlap is greater than or equal to the personal overlap.
Specifically, when $k$ is less than $\sqrt{π} E [μ_{u}^{2}]$ , the equality can be removed.
- Calculating $E [μ_{u}^{2}]$ requires a combination of $μ_{o}$ and $σ^{2}$ , which requires further assumptions about the distribution.

Of course, this assumption may be somewhat strong, so we can consider relaxing this condition:

If a certain percentage (namely $β$ ) of users satisfy $μ_{u}^{+} < μ_{u}^{-}$ , we can still achieve that the global overlap is greater than or equal to the personal overlap.
Specifically, we could introduce a parameter $γ$ when setting $μ_{u}^{+}$ and $μ_{u}^{-}$ , thereby adjusting the mean of the distribution sampled by $ϵ$ by adding or subtracting $γ$ to control $β$ :
- However, the specific expression for this percentage would require further assumptions about the distribution in A3.
- Generally, when $β$ is larger than a value related to $μ_{o}, σ, k$ , it is guaranteed that the above conclusion is valid.
- According to Figure 4, more than 95 percent of users meet $μ_{u}^{+} < μ_{u}^{-}$ . Thus, the above conditions are very easy to meet.

2. Magnitude of $k$ in Practical Scenarios

A larger $k$ implies that the difference between users' normal interaction loss and noisy interaction loss may be greater. Based on our experience and the illustrations provided in the paper:

In Figure 1 (BPR loss), we observe that the loss ranges between 0 and 3.0, with the mean approximately 0.1618.
In Figure 4, the absolute difference (calculation by quartiles as explained in lines 383-386 in the paper) between normal interaction loss and noisy interaction loss lies between 0 and 1.0.
- When we calculate by using the mean values, we get that these differences lies between 0 and 0.64.
- We can assertively conclude that:
  - When $3 \sqrt{2 k} \approx 0.64$ , i.e., $k \approx 0.0228$ , there is a 99.7% probability that the difference lies between 0 and 0.64.
  - (This approximation assumes a Gaussian distribution for the Personal Distribution solely to provide a numerical example to show that k is not too large).
Therefore, the scenario where $k ≫ μ_{o}^{2}$ is highly unlikely to occur, and the theorem mentioned earlier can effectively disregard the impact of such extreme cases.

Once again, thank you for your valuable feedback. If you have any further questions, please do not hesitate to let me know, and I would be happy to engage in a deeper discussion with you. Additionally, we hope you will consider increasing the score of our work, as our methods have been empirically and theoretically validated and we believe they make a meaningful contribution to the research field.

Replying to Official Comment by Authors

Official Comment by Reviewer #3

Official Commentby Reviewer #307 Dec 2024, 17:51Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thanks for the response. These theoretical analyses are interesting and may further strengthen this paper's motivation. I think they may also inspire other scenarios with similar heterogeneous loss. Although other reviewers' opinions may be valid, I have decided to raise the scores (novelty, technical quality) from (4, 4) to (5, 6).

Replying to Official Comment by Reviewer #3

Thank you for your reply

Official Comment07 Dec 2024, 21:24Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for taking the time to review our response and for considering an increased score for our paper. We are grateful for the insightful discussion and your valuable comments on improving our work.

Sincerely,
Authors

Official Review of Submission877 by Reviewer #4

Official Reviewby Reviewer #401 Dec 2024, 15:11 (modified: 09 Dec 2024, 01:35)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer #4, Authors

Review:

Summary

The paper addresses the challenge of noise in implicit feedback for recommender systems, caused by factors such as human error and behavioral ambiguity, which degrade recommendation accuracy and robustness. Existing methods fail to effectively distinguish between normal and noisy interactions due to significant overlap in the loss distribution, particularly with pairwise loss functions. To overcome this, the authors propose PLD (Personal Loss Distribution), a resampling strategy that prioritizes normal interactions by leveraging a user’s personal loss distribution. Theoretical analysis and experiments on three datasets demonstrate PLD’s effectiveness and robustness across varying noise ratios.

Pros

The paper is well-written, and the problem is clearly and convincingly motivated.
The survey of related work is thorough and sufficient.
The paper provides a fresh perspective by using the observation of a user's personal loss distribution to judge interaction noise.

Cons

Reweighting strategies for interaction denoising in recommendation systems, based on predefined assumptions, have already been extensively studied. However, the authors fail to provide sufficient empirical evidence to support the assumptions proposed in this work.
While the method section includes extensive theoretical analysis and discussion, it heavily relies on intuitive reasoning and unproven assumptions. The simplistic reweighting design and the lack of modeling for complex implicit feedback make it difficult to fully trust the method’s effectiveness.
The paper is based on strong assumptions but lacks a detailed discussion of the proposed method's limitations and its applicability to specific usage scenarios.

Questions:

Using observations of users' personal loss distributions as a criterion for identifying interaction noise is a strong assumption. Can the authors provide empirical evidence across datasets in different application scenarios to validate the reliability of this foundational assumption?

Ethics Review Flag: No

Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community

Novelty: 4

Technical Quality: 4

Reviewer Confidence: 3: The reviewer is confident but not certain that the evaluation is correct

Rebuttal by Authors

Rebuttal06 Dec 2024, 17:42Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Rebuttal:

Thank you for your comments. We appreciate your recognition of our work's clear motivation, comprehensive literature survey, novel perspective, and excellent writing. However, we regret the misunderstandings regarding our paper.

Due to space limitations, we first clarify several fundamental misunderstandings regarding our method's assumption to help you better assess our work's contributions. More detailed explanations and supporting materials will follow in the point-by-point replies.

Observation (Assumption): The loss of noisy interactions is higher than that of normal interactions.

This assumption is commonly recognized and used in existing works [1-3] and has been extensively validated empirically.
We have provided clear evidence in our paper, as shown in Figures 1-4 and Tables 1-2.
- Figures 1-2 show that noisy interactions incur higher loss than normal interactions in Global Loss Distribution.
- Figures 3-4 confirm that noisy interactions have higher loss in Personal Loss Distribution.
- Tables 1-2 present the statistical data.

To further address your misunderstanding, we provide additional results on three different scenario datasets. The table below shows the statistics of the loss of noisy interactions minus the loss of normal interactions, with all values greater than 0, indicating that noisy interactions consistently have higher loss in two loss distributions.

	Global	Personal
Gowalla - Check-in dataset	0.040	0.082
Yelp - Review dataset	0.099	0.183
MIND - News dataset	0.035	0.098

Our assumption about the loss phenomenon is reasonable and has been recognized by other reviewers. We will provide detailed explanations and supporting data in subsequent replies. We sincerely thank you for your time and effort in reviewing our paper. We hope our clarifications help you evaluate our work and kindly request you reconsider the overall assessment of this paper. We believe our work offers valuable insights and advances the state-of-the-art in robust recommender systems, contributing meaningfully to the research community.

Denoising Implicit Feedback for Recommendation.
Self-Guided Learning to Denoise for Robust Recommendation.
Double Correction Framework for Denoising Recommendation.

Detailed Explanations for Q1-Q2 (1/2)

Official Comment06 Dec 2024, 17:44Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q1: The authors fail to provide sufficient empirical evidence to support the assumptions proposed in this work.

A1: Our work indeed provides empirical evidence to support these assumptions. We present user cases (Figure 3) and statistical evidence (Figures 1 and 2, Figures 4, Tables 1 and 2).

See Figures 1 and 2, Table 1: Our work identifies that in the global loss distribution:
- The loss of noisy interactions is higher than that of normal interactions.
See Figures 3 and 4, Table 2: We present evidence demonstrating that in the personal loss distribution:
- The loss of noisy interactions is higher than that of normal interactions.
- Theorem 1 provides our method with superior denoising guarantees.

Additionally, we have already provided statistical results on three different scenario datasets as shown in previous rebuttal (Gowalla Check-in dataset, Yelp Review dataset, MIND News dataset), verifying that in both global and personal loss distributions, the loss of noisy interactions is higher than that of normal interactions. We believe this further alleviates your concerns.

Overall, we have provided denoising method PLD assurances from both empirical and theoretical perspectives. As Reviewer #2 acknowledged, our data analysis offers sufficient evidence to highlight the shortcomings of existing methods and support our assumptions.

Q2: Theoretical analysis heavily relies on intuitive reasoning and unproven assumptions.

A2: We would like to clarify that:

Regarding Assumptions, we used two assumptions in this theorem:

Assumption that the loss of noisy interactions is higher than that of normal interactions: We have clarified this repeatedly in previous replies, citing evidence from the paper and providing additional data to support this. We have validated that in both global and personal loss distributions, the loss of noisy interactions is higher than that of normal interactions.
Assumption that the personal loss distribution follows a Gaussian distribution:
1. As illustrated in Figure 3 of the main text, the personal loss distribution approximates a Gaussian distribution. And we conducted the Shapiro-Wilk test on the normal interactions within the personal loss distributions (a higher p-value indicates better conformity to a Gaussian distribution).
  1. At a significance level of 0.05, 65.89% of users' data accepted the normal distribution assumption.
  2. At a more stringent significance level of 0.005, 78.98% of users' data did so.
  3. This proves that the personal loss distribution follows a Gaussian distribution.
2. Additionally, this assumption primarily facilitates the derivation of specific probability expressions and does not affect the overall conclusions.
  1. Specifically, you can refer to the proof in the appendix (lines 1198-1199), where the expressions do not rely on any particular distributional assumptions:
    - $E [Λ_{normal}] = E [S_{x}] \cdot E [\frac{1}{S_{x} + S_{y}}] - \frac{k n}{n + m} \cdot \frac{Var [\exp (- x_{i})]}{(k - 1)^{2} C^{2}},$
    - where $E [\frac{1}{S_{x} + S_{y}}] \approx \frac{1}{E [S_{x}] + E [S_{y}]} (1 + \frac{Var [S_{x}] + Var [S_{y}]}{{(E [S_{x}] + E [S_{y}])}^{2}}) .$
    - Above expressions do not rely on any particular distributional assumptions.
  2. Similarly, in Equation (2), when using different distributions, only the corresponding means and variances need to be substituted into $α$ , $β$ , and $γ$ , without altering the conclusion of Theorem 1.

We would like to emphasize that our proofs have undergone rigorous mathematical derivation (see Appendix A.1). The rigor of our theoretical analysis has been commended by all other reviewers. We believe that certain misunderstandings may have led to this misjudgment.

Detailed Explanations for Q3-Q4 (2/2)

Official Comment06 Dec 2024, 17:45Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q3: The simplistic reweighting design and the lack of modeling for complex implicit feedback make it difficult to fully trust the method’s effectiveness.

A3: First, we would like to clarify that our resampling method can be integrated with any loss function and do model the complex implicit feedback. This simple but effective design enhances the method's generalizability and scalability.

To further illustrate this point, we provide experimental results using NeuMF (NeuMF models feedback through a neural network), which models more complex feedback compared to dot-product models:

Gowalla	Recall@20
NeuMF	0.1203
+ R-CE	0.1295
+ T-Ce	0.0979
+ DeCA	0.1251
+ BOD	0.1267
+ DCF	0.1294
+ PLD	0.1345
Gain	3.94%

Additionally, we provide results for PLD integrated with a self-supervised backbone model, DCCF (SIGIR 2023). Due to the complexity of these backbone models, not all baselines could be incorporated. Therefore, we have prioritized providing results for PLD as shown below:

Gowalla	Recall@20	Recall@50	NDCG@20	NDCG@50
DCCF (SIGIR23)	0.1649	0.2217	0.1118	0.1329
+ PLD	0.1710	0.2392	0.1151	0.1428
Gain	3.70%	7.90%	2.95%	7.45%

Our method, PLD, achieved state-of-the-art performance across these models, demonstrating that PLD do model the complex implicit feedback inherent in backbone models.

Additionally, the theoretical guarantees we provide support the effectiveness of our approach.

Q4: The paper is based on strong assumptions but lacks a detailed discussion of the proposed method's limitations and its applicability to specific usage scenarios.

A4: We would like to clarify agian that the assumptions our work relies on have been well-validated, as evidenced by

User cases (Figure 3)
Statistical evidence (Figures 1-2, 4, Tables 1-2).
Additionally, in our previous replies, we have provided statistical data across different datasets.

These pieces of evidence confirm that in the personal loss distribution, the loss of noisy interactions is higher than that of normal interactions.

Our method merely participates in the model training process, adjusting the sampling probability of positive samples. Compared to other methods, our approach offers greater flexibility and adaptability to different backbone models. In A3, we also provided results on backbone models of varying complexity.

In real-world scenarios, behaviors such as misclicks by some users can introduce significant noise into the data. In such cases, by incorporating PLD and adjusting the sampling probability of positive samples during model training, we can achieve improved performance. This has been validated in Section 5.2 of our paper.

We'll further include these discussion in the revised paper.

Thank you for your detailed comments and useful suggestions, as well as your appreciation of the importance of this work. We believe that the above responses will solve your misunderstandings regarding the assumptions used in our method. We look forward to your reconsideration and reevaluation of the overall assessment of our work.

Replying to Detailed Explanations for Q3-Q4 (2/2)

Follow up

Official Comment08 Dec 2024, 21:34Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for taking the time and effort to review our paper. We have carefully considered your comments and responded to each point with detailed clarifications and supporting evidence. As the deadline for the discussion period between authors and reviewers is approaching, we kindly ask whether our responses have addressed your concerns. We look forward to your feedback.

Thank you once again for your time and effort.

Sincerely,
The Authors

Replying to Follow up

Official Comment by Reviewer #4

Official Commentby Reviewer #409 Dec 2024, 01:36Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for the authors' patient response. I have revised the score.

Replying to Official Comment by Reviewer #4

Thank you for your reply

Official Comment09 Dec 2024, 09:10Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for reviewing our response and for your willingness to increase the score of our paper. We are pleased to hear that our response has addressed your concerns. Please don’t hesitate to reach out if you have any further questions.

Sincerely,
The Authors

Official Review of Submission877 by Reviewer #5

Official Reviewby Reviewer #529 Nov 2024, 22:11 (modified: 10 Dec 2024, 13:01)Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer #5, Authors

Review:

This paper focuses on denoising user-item interaction behavior in recommender systems. Previous work often relied on the assumption that the atomic loss function of noisy interactions is higher than that of normal interactions, thereby identifying noisy interactions from raw interactions. However, this paper finds that the loss function value distribution of noisy interactions significantly overlaps with that of normal interactions. As a result, traditional methods tend to misclassify noisy interactions as normal, and vice versa. By analyzing the loss values of all interactions for each user, this paper proposes a personalized denoising strategy for interactions, called PLD. Through comprehensive experiments, ablation studies, and sound theoretical analysis, the proposed PLD significantly reduces the overlap between the loss function value distributions of noisy and normal interactions, while consistently improving recommendation performance.

strengths:

S1: This paper focuses on denoising user-item interaction behavior in recommender systems and proposes a new personalized strategy, PLD, to address the overlap between noisy and normal interaction loss distributions.
S2: The proposed PLD strategy is based on analyzing the loss values of all interactions for each user, making it more tailored and effective for individual users.
S3: Comprehensive experiments, thorough ablation studies, and solid theoretical analysis demonstrate that PLD significantly reduces the overlap between noisy and normal interaction loss distributions, and consistently improves recommendation performance by effectively handling noisy interactions, leading to more accurate and reliable recommendations.

Questions:

weaknesses：

W1: potentially inappropriate assumption of homoscedastic Gaussian distribution in Theorem 1. First of all, based on Figures 1 and 2, if we assume that the loss values follow a Gaussian distribution, the normal interaction and noisy interaction for overall losses are clearly not homoscedastic. Considering that the overall loss distribution is the expectation of the personalized loss distribution for each user, i.e., $l_{\text{overall}}(\text{interaction})=\mathbb{E}{\text{user}}[l{\text{personalized}}(\text{user},\text{interaction})] $, a s s u m i n g h o m o s c e d a s t i c i t y b e t w e e n$ l_{\text{personalized}}(\text{user},\text{normal interaction}) $a n d$ l_{\text{personalized}}(\text{user},\text{noisy interaction}) $f o r a n y u s e r w o u l d r e s u l t i n e q u a l v a r i a n c e s f o r$ l_{\text{overall}}(\text{normal interaction}) $与$ l_{\text{overall}}(\text{noisy interaction})$, which does not align with the actual situation in Figures 1 and 2.
W2: selection of backbone is limited. This paper only selects MF and LightGCN as backbone models, which are classic CF backbones. A more comprehensive backbone study, especially with more cutting-edge CF backbones, would better demonstrate the effectiveness of PLD.

Question：

**Q1: **Would it be better to swap the legend order in Figure 1, considering that the current placement of the legend is farther away from the corresponding bar chart? I initially misunderstood it.
Q2: The results in Table 4 are under the BPR loss setup. It seems that the performance of many baselines is actually a strongly negative effect. However, in Table 6, under the BCE loss setup, other baselines are effective. This result is quite strange. Why does the difference between BPR loss and BCE loss lead to such a significant discrepancy in the results? It might be useful to explore the results on other datasets under the BCE loss setting.

If my concerns are addressed, I will be happy to raise my score.

Ethics Review Flag: No

Scope: 4: The work is relevant to the Web and to the track, and is of broad interest to the community

Novelty: 6

Technical Quality: 5

Reviewer Confidence: 3: The reviewer is confident but not certain that the evaluation is correct

Reply to the Rebuttal

Official Commentby Reviewer #510 Dec 2024, 12:45Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer #5, Authors

Comment:

Thanks very much to the authors for the thorough explanations and the detailed experimental results. My concerns have been addressed. I also reviewed the authors' rebuttals to other reviewers. I will raise my score to (6,5).

Thank you for your reply

Official Comment10 Dec 2024, 14:16Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Reviewer #5, Authors

Comment:

Thank you for taking the time to review our response and for considering an increased score for our paper. We are grateful for your valuable comments to improve our work.

Sincerely,
Authors

Rebuttal by Authors

Rebuttal06 Dec 2024, 17:46Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Rebuttal:

Thank you for your comments. We greatly appreciate your recognition of our work's novel methodology, effectiveness, comprehensive experiments, thorough ablation studies, and solid theoretical analysis. Below, we respond to each of your questions to address your concerns.

Due to space limitations, we first provide a brief summary of our responses to your concerns. More detailed explanations and supporting materials will be provided in the subsequent replies.

Assumption of Homoscedastic Gaussian Distribution:
- The assumption of a homoscedastic Gaussian distribution in Theorem 1 is primarily introduced to enhance the clarity of the expressions.
- This assumption does not affect the conclusion of the theorem.
- We will provide a detailed explanation in the subsequent reply A1, including how Theorem 1 can be adjusted under a non-homoscedastic Gaussian distribution.

Selection of Backbone: We address this by:
- Providing results for other classical backbone models, such as NeuMF.
- Providing results for more complex self-supervised models, such as DCCF (SIGIR23).
- In the subsequent reply A2, results demonstrate that PLD achieves the best performance across these models.

Difference Between BPR Loss and BCE Loss Leading to Significant Discrepancy in Results:
- BPR loss significantly amplifies the overlap between normal interactions and noisy interactions in the global loss distribution.
- This amplification severely degrades the performance of many baselines based on the global loss distribution.
- We will elaborate on why BPR loss amplifies the overlap and provide performance results under the BCE loss setting on other datasets in the subsequent reply A4.

We will provide more detailed explanations and supporting data for the above points in the subsequent replies. We sincerely thank you for your time and effort in reviewing our paper, as well as the valuable suggestions that will help improve our work further. We hope that our clarifications address your concerns. Based on our responses, we kindly request that you reconsider the overall assessment of this paper. We believe that our work offers valuable insights and advances the state-of-the-art in robust recommender systems, making a meaningful contribution to the research community.

Detailed Explanations for Q1-Q3 (1/2)

Official Comment06 Dec 2024, 17:49Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q1: Potentially inappropriate assumption of homoscedastic Gaussian distribution in Theorem 1.

A1: Thank you for your comment!

Firstly, we would like to clarify that the Gaussian distribution assumption we introduced pertains to the personal loss distribution, whereas Figures 1 and 2 illustrate the global loss distribution. We do not make any assumptions about the global loss distribution.

Secondly, regarding the assumption of a homoscedastic Gaussian distribution in Theorem 1, it is primarily introduced to simplify the derivation and enhance the clarity of the expressions.

This assumption does not affect the conclusion of the theorem.
Specifically, you can refer to the proof in the appendix (lines 1198-1199), where the expressions for $E [Λ]$ do not rely on any specific distributional assumptions:
- $E [Λ_{normal}] = E [S_{x}] \cdot E [\frac{1}{S_{x} + S_{y}}] - \frac{k n}{n + m} \cdot \frac{Var [\exp (- x_{i})]}{(k - 1)^{2} C^{2}},$
- where $E [\frac{1}{S_{x} + S_{y}}] \approx \frac{1}{E [S_{x}] + E [S_{y}]} (1 + \frac{Var [S_{x}] + Var [S_{y}]}{{(E [S_{x}] + E [S_{y}])}^{2}}) .$
- Above expressions do not rely on any particular distributional assumptions.
Similarly, if different variances ( $σ^{2}$ ) are to be used in Theorem 1, the corresponding values can be substituted into $α$ and $β$ , while distinguishing between $γ_{1}$ and $γ_{2}$ .
After using different variances ( $σ^{2}$ ), the mathematical form of Equation (2) does not change, and the conclusion remains unaffected.

This substitution does not affect the conclusion of Theorem 1. Due to space limitations, we will discuss this point in more detail in the revised manuscript.

Q2: Selection of backbone is limited.

A2: Thank you for your suggestion! To address your concern, we have additionally provided experimental results using NeuMF as a backbone model.

Gowalla	Recall@20
NeuMF	0.1203
+ R-CE	0.1295
+ T-Ce	0.0979
+ DeCA	0.1251
+ BOD	0.1267
+ DCF	0.1294
+ PLD	0.1345
Gain	3.94%

Furthermore, we have explored experimental results with more advanced backbone models, DCCF. Due to the complexity of these backbone models, not all baselines could be incorporated. Therefore, we have prioritized providing results for PLD as shown below:

Gowalla	Recall@20	Recall@50	NDCG@20	NDCG@50
DCCF (SIGIR23)	0.1649	0.2217	0.1118	0.1329
+ PLD	0.1710	0.2392	0.1151	0.1428
Gain	3.70%	7.90%	2.95%	7.45%

Our method, PLD, achieved improvements across different backbones, demonstrating its effectiveness and generalizability.

Q3: Considering that the current placement of the legend is farther away from the corresponding bar chart?

A3: Thank you for your suggestion! We will adjust the position of the legend in the revised version to ensure it is closer to the corresponding bar charts, thereby enhancing readability.

Detailed Explanations for Q4 (2/2)

Official Comment06 Dec 2024, 17:53Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Q4: Why does the difference between BPR loss and BCE loss lead to such a significant discrepancy in the results? It might be useful to explore the results on other datasets under the BCE loss setting.

A4: Thank you for your question! As mentioned in lines 121-127 of our paper, the overlap phenomenon is further exacerbated when using BPR loss, which negatively impacts the effectiveness of methods based on global loss distributions.

This is because:

BCE Loss: BCE loss is a point-wise loss, allowing us to directly compute the loss of each interaction.
BPR Loss: BPR loss is a pair-wise loss, where computing the loss of each interaction depends on negative samples. Introducing negative samples increases the variances of the global loss distribution.
Consequently, the loss of normal interactions for difficult-to-train users (high-loss users) may exceed that of noisy interactions for easy-to-train users (low-loss users), resulting in greater overlap.
This makes global loss distributions less effective in distinguishing normal interactions from noisy interactions, thereby rendering denoising methods that rely on global loss distributions ineffective.

To address your concern, we provide additional results on the Gowalla dataset using the BCE loss setting. We found that PLD still achieves the best performance on the Gowalla dataset.

Gowalla	Recall@20
MF-BCE	0.1109
+ R-CE	0.1306
+ T-Ce	0.1319
+ DeCA	0.1310
+ BOD	0.1361
+ DCF	0.1351
+ PLD	0.1456
Gain	6.98%

Thank you for your detailed comments and useful suggestions, as well as your appreciation of the importance of this work. We hope that our clarifications address your concerns. We look forward to your reconsideration and reevaluation of the overall assessment of our work.

Replying to Detailed Explanations for Q4 (2/2)

Follow up

Official Comment08 Dec 2024, 21:34Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

Thank you for taking the time and effort to review our paper. We have carefully considered your comments and responded to each point with detailed clarifications and supporting evidence. As the deadline for the discussion period between authors and reviewers is approaching, we kindly ask whether our responses have addressed your concerns. We look forward to your feedback.

Thank you once again for your time and effort.

Sincerely,
The Authors

Replying to Follow up

Kindly Reminder

Official Comment09 Dec 2024, 20:03Program Chairs, Senior Area Chairs, Area Chairs, Reviewers Submitted, Authors

Comment:

We sincerely thank you for your time and effort in reviewing our paper and supporting our work. With the discussion period deadline (Dec. 10th) approaching, we are following up on our earlier response.

We understand your schedule may be busy, and we appreciate the time you've already invested in reviewing our paper. We have carefully considered your comments and provided detailed clarifications and additional support for each concern in our response.

We would be grateful for your feedback on whether our response has addressed your concerns. If your concerns have been addressed, we kindly request that you consider re-evaluating our paper and increasing our scores. Your continued support is immensely valuable, and we truly appreciate your feedback.

Once again, thank you for your insights and ongoing support.

Sincerely,
The Authors