Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to select DPO subset? #36

Open
qychen2001 opened this issue Dec 22, 2024 · 1 comment
Open

How to select DPO subset? #36

qychen2001 opened this issue Dec 22, 2024 · 1 comment

Comments

@qychen2001
Copy link

To create the dataset, we first selected 100K high-quality Magpie instructions with diverse task categories, then generated responses using Llama 3 8B Instruct 5 times for each instruction, using a temperature of 0.8. We then annotated RM scores using RLHFlow/ArmoRM-Llama3-8B-v0.1, labeling the response with the highest RM score as the chosen response, and the one with the lowest RM score as the rejected response.

Very wonderful work!
When I have filtered 300k data, I want to know how to get this 100k subset to synthesize DPO data.
If you can provide this part of the data filtering code, I believe it will be very helpful.

@zhangchen-xu
Copy link
Member

Hi Qiyuan,

Thank you for your question. This 100K was filtered empirically lol. We noted that the original Magpie dataset had too many information-seeking and advice-seeking entries, so we manually decreased their proportion in the DPO phase and made the task categories more diverse and balanced.

For example, for Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1, we first apply the following filter from raw dataset:

  • Difficulty >= medium
  • Input Quality >= good
  • Reward >= -5

We then ramdomly sampled these amounts :

  • 30K Information Seeking & Advice Seeking
  • 15K Coding&Debugging
  • 25K Math
  • 30K from all other task categories

... and get 100K instructions with more diverse and balanced task categories.

Please let me know if you need more information and I am happy to discuss! We will add these details to the appendix in our next Arxiv update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants