Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUBMISSION] Automated Policy-based Preference Alignment using Synthetic Data Generation #108

Open
wants to merge 2 commits into
base: december-2024
Choose a base branch
from

Conversation

souzatharsis
Copy link
Contributor

December 2024 Student Submission

See html rendered submission here for ease of review.
Accompanying Python Notebook is in the PR.

Module Completed

  • Module 1: Instruction Tuning
  • Module 2: Preference Alignment
  • Module 3: Parameter-efficient Fine-tuning
  • Module 4: Evaluation
  • Module 5: Vision-language Models
  • Module 6: Synthetic Datasets
  • Module 7: Inference
  • Module 8: Deployment

Changes Made

In this case study, we demonstrate how to use DPO to align a language model to a user-provided policy further automating the process via synthetic data generation and LLM-as-judge evaluation.

We go over a Case Study for Acme Inc., a company dedicated to democratizing access to computer science education for K-12 students. Acme Inc. is in the process of creating a chatbot named smolK-12, a small open source LLM, specifically designed for K-12 students.

We’ll explore how to align a language model with Acme Inc.’s policy to ensure its LLM-powered applications are safe and appropriate for K-12 students.

Notebooks Added/Modified

List any notebooks you've added or modified:

  • Added new example in 2_preference_alignment/notebooks/smolk12
  • Modified existing notebook with additional examples
  • Added documentation or comments

Checklist

Questions or Discussion Points

Add any questions you have or points you'd like to discuss:
I am particularly interested in your feedback pertaining to the points I've raised in the Discussion section around:

  1. Synthetic Data Generation
  2. Choice of Base Model
  3. Evaluation Methodology
  4. DPO Dataset Composition
  5. Fine-tuning Process

Additional Notes

Any other information that might be helpful for reviewers:

This is a Case Study part of an open source book I am writing "Taming LLMs".

I would love to highlight the great smolLM work you are doing here. Hence, I would truly appreciate your feedback on the here submitted Case Study.

Cheers,
Tharsis.

@burtenshaw
Copy link
Collaborator

This is a really nice submission @souzatharsis. Thanks!

My first question is: Have you considered using a library like distilabel?

We're currently working on the synthetic data module and I think your use case could fit there.

@souzatharsis
Copy link
Contributor Author

Hi @burtenshaw , thanks for the feedback!
distilabel sounds indeed an elegant way to replicate my data generation process, thanks for the recommendation.

@souzatharsis
Copy link
Contributor Author

Looking forward to additional feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants