Skip to content

Commit

Permalink
Update 2022-02-01-drop-noise-for-ctr.md
Browse files Browse the repository at this point in the history
  • Loading branch information
guotong1988 authored Apr 7, 2024
1 parent f436797 commit 06746ed
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions research/_posts/2022-02-01-drop-noise-for-ctr.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ In this section, we evaluate our method on our real-world dataset. Our dataset-A

### 5. Discussion

#### 5.1 Why drop-noise method work?

Why drop-noise method work? Because deep learning is statistic-based. Take classification as example. (In a broad sense, all the machine learning tasks can be viewed as classification.)

If there are three very similar data (data-1/data-2/data-3) in total, which labels are class-A/class-A/class-B, Then the trained model will probably predict class-A for data-3.
Expand All @@ -78,6 +80,11 @@ If we do not drop data-3, the model prediction for new data that is the most sim

If we drop data-3, the model prediction for new data that is the most similar to data-3 will be class-A, which is right.


#### 5.2 Real CTR Application

This method can not apply to real CTR application, because CTR application like recommender system is memory-based. The CTR model memorizes the user-item data first. So this method can only improve the AUC of un-drop part of user-item data.

### 6. Conclusion

Based on the good performance of previous works \cite{ref1} \cite{ref2} that have been verified on human-labeled dataset. We further apply the find-noise idea to dataset generated by user behavior and CTR task. The experiment results show our method improves the offline AUC. Also, noise data is not the low-frequency user data. We will verify our idea on online performance in the future.
Expand Down

0 comments on commit 06746ed

Please sign in to comment.