Skip to content

Commit

Permalink
Update 2024-02-01-auto-re-label.md
Browse files Browse the repository at this point in the history
  • Loading branch information
guotong1988 authored Apr 9, 2024
1 parent fbfbe74 commit 90e6042
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion core_research/_posts/2024-02-01-auto-re-label.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Step-1. We assume it is a classification task. We have a human-labeled dataset-v

Step-2. Using model-v1 to predict the classification label for dataset-v0. If the predicted label of dataset-v0 (training dataset and dev dataset) do not equal to the human label of dataset-v0, we think they are the noisy data.

Step-3. We randomly re-set each of the noisy data label to model predicted label or human label. Each setting possibility becomes a new dataset with training + dev dataset. Then we get $2^N$ datasets ($dataset-v1, dataset-v2 ... dataset-vN$) and $2^N$ models ($model-v1, model-v2 ... model-vN$).
Step-3. We randomly re-set each of the noisy data label to model predicted label or human label. Each setting possibility becomes a new dataset with training + dev dataset. Then we get $2^N$ datasets (dataset-v1, dataset-v2 ... dataset-vN) and $2^N$ models (model-v1, model-v2 ... model-vN).

Step-4. We use dev dataset accuracy to select the best dataset from all the $2^N$ datasets. The best dataset becomes the new dataset-v0.

Expand Down

0 comments on commit 90e6042

Please sign in to comment.