Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support new DPO data format and update SFT config to use override API #405

Merged
merged 8 commits into from
Dec 4, 2024

Conversation

arendu
Copy link
Collaborator

@arendu arendu commented Nov 14, 2024

What does this PR do ?

This PR makes the dpo dataset use chat format tokens from the model's config yaml instead of hardcoding chat/special tokens in the jsonl data file.

Currently, each datapoint inside a DPO jsonl data file, looks like this:

{
  "prompt": "<extra_id_0>System\n\n<extra_id_1>User\nbacillus subtilus\n<extra_id_1>Assistant\n",
  "chosen_response": "Bacillus ... and industry alike.\n<extra_id_1>",
  "rejected_response": "The Bacillus ... fields of study.\n<extra_id_1>",
  "rejected_reward": 3,
  "chosen_reward": 4
}

With this PR it should be like this (OpenAI list of messages format with no chat/formatting tokens):

{
  "prompt": [
    {
      "role": "system",
      "content": ""
    },
    {
      "role": "user",
      "content": "bacillus subtilus"
    }
  ],
  "chosen_response": {
    "role": "assistant",
    "content": "Bacillus ... and industry alike."
  },
  "rejected_response": {
    "role": "assistant",
    "content": "The Bacillus ... fields of study."
  },
  "chosen_reward": 4,
  "rejected_reward": 3
}

Additionally There is a script added to convert old data files into the new format.

python nemo_aligner/data/nlp/scripts/undo_special_tokens.py <path_to_old_format_dpo_jsonl_file>

A new file will be written in the same location as the old format file.

Changelog

  • Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

  • Does the trainer resume and restore model state all states?
  • Does the trainer support all parallelism techniques(PP, TP, DP)?
  • Does the trainer support max_steps=-1 and validation?
  • Does the trainer only call APIs defined in alignable_interface.py?
  • Does the trainer have proper logging?

Additional Information

  • Related to # (issue)

@arendu arendu requested review from gshennvm and terrykong November 15, 2024 06:33
@arendu arendu marked this pull request as ready for review November 15, 2024 06:36
@arendu arendu changed the title Adithyare/dpo data refac DPO data format refactor Nov 15, 2024
examples/nlp/gpt/conf/gpt_dpo.yaml Outdated Show resolved Hide resolved
examples/nlp/gpt/train_gpt_sft.py Show resolved Hide resolved
nemo_aligner/data/nlp/builders.py Outdated Show resolved Hide resolved
nemo_aligner/data/nlp/datasets.py Show resolved Hide resolved
nemo_aligner/data/nlp/scripts/undo_special_tokens.py Outdated Show resolved Hide resolved
nemo_aligner/data/nlp/scripts/undo_special_tokens.py Outdated Show resolved Hide resolved
nemo_aligner/data/nlp/datasets.py Show resolved Hide resolved
nemo_aligner/data/nlp/datasets.py Outdated Show resolved Hide resolved
nemo_aligner/data/nlp/datasets.py Show resolved Hide resolved
examples/nlp/gpt/train_gpt_dpo.py Outdated Show resolved Hide resolved
@arendu arendu requested a review from terrykong November 18, 2024 22:38
@terrykong terrykong changed the title DPO data format refactor feat: support new DPO data format Nov 21, 2024
Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODOs

  • compatbility test
  • Stretch (update the dpo.sh template test script to convert the train data json into this new format)

nemo_aligner/data/nlp/datasets.py Outdated Show resolved Hide resolved
nemo_aligner/data/nlp/scripts/undo_special_tokens.py Outdated Show resolved Hide resolved
nemo_aligner/data/nlp/scripts/undo_special_tokens.py Outdated Show resolved Hide resolved
nemo_aligner/data/nlp/datasets.py Outdated Show resolved Hide resolved
@arendu arendu requested a review from terrykong November 21, 2024 05:29
@arendu arendu added the CI label Nov 21, 2024
@github-actions github-actions bot removed the CI label Nov 21, 2024
tests/test_datasets.py Outdated Show resolved Hide resolved
tests/test_datasets.py Outdated Show resolved Hide resolved
@terrykong terrykong force-pushed the adithyare/dpo_data_refac branch from d32515c to a112c19 Compare November 22, 2024 01:49
@terrykong terrykong force-pushed the adithyare/dpo_data_refac branch from a112c19 to 0d3b8ee Compare November 22, 2024 01:50
@terrykong
Copy link
Collaborator

closing in favor of #403

@terrykong terrykong closed this Nov 22, 2024
@terrykong terrykong mentioned this pull request Nov 22, 2024
8 tasks
@terrykong terrykong reopened this Nov 22, 2024
@terrykong terrykong changed the title feat: support new DPO data format feat: support new DPO data format and update SFT config to use override API Dec 3, 2024
arendu and others added 2 commits December 3, 2024 23:20
terrykong
terrykong previously approved these changes Dec 3, 2024
@terrykong terrykong added the Run CICD Set + un-set to retrigger label Dec 3, 2024
@arendu arendu added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 4, 2024
@arendu arendu added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 4, 2024
@terrykong terrykong enabled auto-merge (squash) December 4, 2024 01:58
@terrykong terrykong merged commit 5d4b2a7 into main Dec 4, 2024
18 checks passed
@terrykong terrykong deleted the adithyare/dpo_data_refac branch December 4, 2024 02:20
terrykong added a commit that referenced this pull request Dec 5, 2024
…de API (#405)

Signed-off-by: Terry Kong <[email protected]>
Signed-off-by: arendu <[email protected]>
Signed-off-by: NeMo-Aligner CI <[email protected]>
Co-authored-by: Terry Kong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Terry Kong <[email protected]>

return output_dict

def convert(self, messages):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it can support apply_chat_template (https://huggingface.co/docs/transformers/main/en/chat_templating) for huggingface tokenizers that are adapted in most open-sourced LLMs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Run CICD Set + un-set to retrigger
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants