How to select DPO subset? #36

qychen2001 · 2024-12-22T05:01:31Z

To create the dataset, we first selected 100K high-quality Magpie instructions with diverse task categories, then generated responses using Llama 3 8B Instruct 5 times for each instruction, using a temperature of 0.8. We then annotated RM scores using RLHFlow/ArmoRM-Llama3-8B-v0.1, labeling the response with the highest RM score as the chosen response, and the one with the lowest RM score as the rejected response.

Very wonderful work!
When I have filtered 300k data, I want to know how to get this 100k subset to synthesize DPO data.
If you can provide this part of the data filtering code, I believe it will be very helpful.

zhangchen-xu · 2024-12-29T03:52:43Z

Hi Qiyuan,

Thank you for your question. This 100K was filtered empirically lol. We noted that the original Magpie dataset had too many information-seeking and advice-seeking entries, so we manually decreased their proportion in the DPO phase and made the task categories more diverse and balanced.

For example, for Magpie-Align/Magpie-Llama-3.1-Pro-DPO-100K-v0.1, we first apply the following filter from raw dataset:

Difficulty >= medium
Input Quality >= good
Reward >= -5

We then ramdomly sampled these amounts :

30K Information Seeking & Advice Seeking
15K Coding&Debugging
25K Math
30K from all other task categories

... and get 100K instructions with more diverse and balanced task categories.

Please let me know if you need more information and I am happy to discuss! We will add these details to the appendix in our next Arxiv update!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to select DPO subset? #36

How to select DPO subset? #36

qychen2001 commented Dec 22, 2024

zhangchen-xu commented Dec 29, 2024

How to select DPO subset? #36

How to select DPO subset? #36

Comments

qychen2001 commented Dec 22, 2024

zhangchen-xu commented Dec 29, 2024