You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The material you aligned sounds good. For now, I would focus on getting the core material for synthetic data, and structuring it like the other modules. Which would be something like this:
README
instruction_datasets.md
Magpie
SelfInstruct
preference_datasets.md
UltraFeedback
notebooks/
sft dataset project
dpo dataset project
I would say this is the minimum which aligns with the previous modules.
improving synthetic data (injecting diversity, evolving/deita)
evaluating synthetic data (quality classifiers, llms as judges, filtering/deita)
I would say that these are good extras, which we can come back to if we have time.
The evaluation module is not complete. It requires a finalised structure, some more informations, and exercises.
Structure
Here is a basic proposal for a structure:
project
Comments
The text was updated successfully, but these errors were encountered: