-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support new DPO data format and update SFT config to use override API #405
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODOs
- compatbility test
- Stretch (update the dpo.sh template test script to convert the train data json into this new format)
d32515c
to
a112c19
Compare
Signed-off-by: Terry Kong <[email protected]>
a112c19
to
0d3b8ee
Compare
closing in favor of #403 |
Signed-off-by: arendu <[email protected]>
for more information, see https://pre-commit.ci Signed-off-by: NeMo-Aligner CI <[email protected]>
Signed-off-by: arendu <[email protected]>
…NeMo-Aligner into adithyare/dpo_data_refac
…de API (#405) Signed-off-by: Terry Kong <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: NeMo-Aligner CI <[email protected]> Co-authored-by: Terry Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Terry Kong <[email protected]>
|
||
return output_dict | ||
|
||
def convert(self, messages): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it can support apply_chat_template
(https://huggingface.co/docs/transformers/main/en/chat_templating) for huggingface tokenizers that are adapted in most open-sourced LLMs?
What does this PR do ?
This PR makes the dpo dataset use chat format tokens from the model's config yaml instead of hardcoding chat/special tokens in the jsonl data file.
Currently, each datapoint inside a DPO jsonl data file, looks like this:
With this PR it should be like this (OpenAI list of messages format with no chat/formatting tokens):
Additionally There is a script added to convert old data files into the new format.
A new file will be written in the same location as the old format file.
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
Checklist when contributing a new algorithm
max_steps=-1
andvalidation
?Additional Information