[SUBMISSION] Automated Policy-based Preference Alignment using Synthetic Data Generation #108
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
December 2024 Student Submission
See html rendered submission here for ease of review.
Accompanying Python Notebook is in the PR.
Module Completed
Changes Made
In this case study, we demonstrate how to use DPO to align a language model to a user-provided policy further automating the process via synthetic data generation and LLM-as-judge evaluation.
We go over a Case Study for Acme Inc., a company dedicated to democratizing access to computer science education for K-12 students. Acme Inc. is in the process of creating a chatbot named smolK-12, a small open source LLM, specifically designed for K-12 students.
We’ll explore how to align a language model with Acme Inc.’s policy to ensure its LLM-powered applications are safe and appropriate for K-12 students.
Notebooks Added/Modified
List any notebooks you've added or modified:
2_preference_alignment/notebooks/smolk12
Checklist
december_2024
branchQuestions or Discussion Points
Add any questions you have or points you'd like to discuss:
I am particularly interested in your feedback pertaining to the points I've raised in the Discussion section around:
Additional Notes
Any other information that might be helpful for reviewers:
This is a Case Study part of an open source book I am writing "Taming LLMs".
I would love to highlight the great
smolLM
work you are doing here. Hence, I would truly appreciate your feedback on the here submitted Case Study.Cheers,
Tharsis.