Update 2022-02-01-drop-noise-for-ctr.md

guotong1988 · Apr 7, 2024 · 06746ed · 06746ed
1 parent f436797
commit 06746ed
Showing 1 changed file with 7 additions and 0 deletions.
diff --git a/research/_posts/2022-02-01-drop-noise-for-ctr.md b/research/_posts/2022-02-01-drop-noise-for-ctr.md
@@ -66,6 +66,8 @@ In this section, we evaluate our method on our real-world dataset. Our dataset-A
 
 ### 5. Discussion
 
+#### 5.1 Why drop-noise method work?
+
 Why drop-noise method work? Because deep learning is statistic-based. Take classification as example. (In a broad sense, all the machine learning tasks can be viewed as classification.) 
 
 If there are three very similar data (data-1/data-2/data-3) in total, which labels are class-A/class-A/class-B, Then the trained model will probably predict class-A for data-3. 
@@ -78,6 +80,11 @@ If we do not drop data-3, the model prediction for new data that is the most sim
 
 If we drop data-3, the model prediction for new data that is the most similar to data-3 will be class-A, which is right.  
 
+
+#### 5.2 Real CTR Application
+
+This method can not apply to real CTR application, because CTR application like recommender system is memory-based. The CTR model memorizes the user-item data first. So this method can only improve the AUC of un-drop part of user-item data.
+
 ### 6. Conclusion
 
 Based on the good performance of previous works \cite{ref1} \cite{ref2} that have been verified on human-labeled dataset. We further apply the find-noise idea to dataset generated by user behavior and CTR task. The experiment results show our method improves the offline AUC. Also, noise data is not the low-frequency user data. We will verify our idea on online performance in the future.