feat: support new DPO data format and update SFT config to use override API #405

arendu · 2024-11-14T23:06:51Z

What does this PR do ?

This PR makes the dpo dataset use chat format tokens from the model's config yaml instead of hardcoding chat/special tokens in the jsonl data file.

Currently, each datapoint inside a DPO jsonl data file, looks like this:

{
  "prompt": "<extra_id_0>System\n\n<extra_id_1>User\nbacillus subtilus\n<extra_id_1>Assistant\n",
  "chosen_response": "Bacillus ... and industry alike.\n<extra_id_1>",
  "rejected_response": "The Bacillus ... fields of study.\n<extra_id_1>",
  "rejected_reward": 3,
  "chosen_reward": 4
}

With this PR it should be like this (OpenAI list of messages format with no chat/formatting tokens):

{
  "prompt": [
    {
      "role": "system",
      "content": ""
    },
    {
      "role": "user",
      "content": "bacillus subtilus"
    }
  ],
  "chosen_response": {
    "role": "assistant",
    "content": "Bacillus ... and industry alike."
  },
  "rejected_response": {
    "role": "assistant",
    "content": "The Bacillus ... fields of study."
  },
  "chosen_reward": 4,
  "rejected_reward": 3
}

Additionally There is a script added to convert old data files into the new format.

python nemo_aligner/data/nlp/scripts/undo_special_tokens.py <path_to_old_format_dpo_jsonl_file>

A new file will be written in the same location as the old format file.

Changelog

Please update the CHANGELOG.md under next version with high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation? Make sure to also update the NeMo Framework User Guide which contains the tutorials

Checklist when contributing a new algorithm

Does the trainer resume and restore model state all states?
Does the trainer support all parallelism techniques(PP, TP, DP)?
Does the trainer support max_steps=-1 and validation?
Does the trainer only call APIs defined in alignable_interface.py?
Does the trainer have proper logging?

Additional Information

Related to # (issue)

examples/nlp/gpt/conf/gpt_dpo.yaml

examples/nlp/gpt/train_gpt_sft.py

nemo_aligner/data/nlp/builders.py

nemo_aligner/data/nlp/datasets.py

nemo_aligner/data/nlp/scripts/undo_special_tokens.py

nemo_aligner/data/nlp/datasets.py

examples/nlp/gpt/train_gpt_dpo.py

terrykong

TODOs

compatbility test
Stretch (update the dpo.sh template test script to convert the train data json into this new format)

nemo_aligner/data/nlp/datasets.py

nemo_aligner/data/nlp/scripts/undo_special_tokens.py

nemo_aligner/data/nlp/datasets.py

tests/test_datasets.py

Signed-off-by: Terry Kong <[email protected]>

terrykong · 2024-11-22T01:55:06Z

closing in favor of #403

Signed-off-by: arendu <[email protected]>

for more information, see https://pre-commit.ci Signed-off-by: NeMo-Aligner CI <[email protected]>

Signed-off-by: arendu <[email protected]>

…NeMo-Aligner into adithyare/dpo_data_refac

…de API (#405) Signed-off-by: Terry Kong <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: NeMo-Aligner CI <[email protected]> Co-authored-by: Terry Kong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Terry Kong <[email protected]>

ccclyu · 2024-12-05T22:45:09Z

nemo_aligner/data/nlp/datasets.py

+
+        return output_dict
+
+    def convert(self, messages):


Do you think it can support apply_chat_template (https://huggingface.co/docs/transformers/main/en/chat_templating) for huggingface tokenizers that are adapted in most open-sourced LLMs?

github-actions bot added the Algorithms label Nov 14, 2024

arendu requested review from gshennvm and terrykong November 15, 2024 06:33

arendu marked this pull request as ready for review November 15, 2024 06:36

arendu changed the title ~~Adithyare/dpo data refac~~ DPO data format refactor Nov 15, 2024

github-actions bot removed the Algorithms label Nov 15, 2024

terrykong requested changes Nov 15, 2024

View reviewed changes

arendu requested a review from terrykong November 18, 2024 22:38

terrykong changed the title ~~DPO data format refactor~~ feat: support new DPO data format Nov 21, 2024

terrykong requested changes Nov 21, 2024

View reviewed changes

arendu requested a review from terrykong November 21, 2024 05:29

arendu added the CI label Nov 21, 2024

github-actions bot removed the CI label Nov 21, 2024

terrykong reviewed Nov 21, 2024

View reviewed changes

tests/test_datasets.py Outdated Show resolved Hide resolved

terrykong reviewed Nov 21, 2024

View reviewed changes

tests/test_datasets.py Outdated Show resolved Hide resolved

terrykong force-pushed the adithyare/dpo_data_refac branch from d32515c to a112c19 Compare November 22, 2024 01:49

feat: dpo dataset new openai chat completion format

0d3b8ee

Signed-off-by: Terry Kong <[email protected]>

terrykong force-pushed the adithyare/dpo_data_refac branch from a112c19 to 0d3b8ee Compare November 22, 2024 01:50

terrykong closed this Nov 22, 2024

terrykong mentioned this pull request Nov 22, 2024

Nemotron5 features #403

Draft

8 tasks

terrykong reopened this Nov 22, 2024

Update test_datasets.py

db3eb40

terrykong changed the title ~~feat: support new DPO data format~~ feat: support new DPO data format and update SFT config to use override API Dec 3, 2024

arendu and others added 2 commits December 3, 2024 23:20

updated to use importskip

adb8130

Signed-off-by: arendu <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1d732ad

for more information, see https://pre-commit.ci Signed-off-by: NeMo-Aligner CI <[email protected]>

terrykong previously approved these changes Dec 3, 2024

View reviewed changes

Merge branch 'main' into adithyare/dpo_data_refac

613a63a

terrykong added the Run CICD Set + un-set to retrigger label Dec 3, 2024

terrykong mentioned this pull request Dec 4, 2024

feat: add context parallel support for SFT #420

Closed

8 tasks

arendu added 2 commits December 4, 2024 01:48

fix for batch size misconfiguration

a76c29a

Signed-off-by: arendu <[email protected]>

Merge branch 'adithyare/dpo_data_refac' of https://github.com/NVIDIA/…

e3d1192

…NeMo-Aligner into adithyare/dpo_data_refac

arendu dismissed terrykong’s stale review via e3d1192 December 4, 2024 01:49

arendu added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 4, 2024

Update gpt_sft.yaml removed comment

db1d5f1

terrykong approved these changes Dec 4, 2024

View reviewed changes

arendu added Run CICD Set + un-set to retrigger and removed Run CICD Set + un-set to retrigger labels Dec 4, 2024

terrykong enabled auto-merge (squash) December 4, 2024 01:58

terrykong merged commit 5d4b2a7 into main Dec 4, 2024
18 checks passed

terrykong deleted the adithyare/dpo_data_refac branch December 4, 2024 02:20

ccclyu reviewed Dec 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support new DPO data format and update SFT config to use override API #405

feat: support new DPO data format and update SFT config to use override API #405

arendu commented Nov 14, 2024 •

edited

Loading

terrykong left a comment •

edited by arendu

Loading

terrykong commented Nov 22, 2024

ccclyu Dec 5, 2024

feat: support new DPO data format and update SFT config to use override API #405

feat: support new DPO data format and update SFT config to use override API #405

Conversation

arendu commented Nov 14, 2024 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Checklist when contributing a new algorithm

Additional Information

terrykong left a comment • edited by arendu Loading

Choose a reason for hiding this comment

terrykong commented Nov 22, 2024

ccclyu Dec 5, 2024

Choose a reason for hiding this comment

arendu commented Nov 14, 2024 •

edited

Loading

terrykong left a comment •

edited by arendu

Loading