Skip to content

Commit

Permalink
docs: add more details on CP + SFT support (#447)
Browse files Browse the repository at this point in the history
Signed-off-by: ashors1 <[email protected]>
Signed-off-by: Terry Kong <[email protected]>
  • Loading branch information
ashors1 authored and terrykong committed Dec 18, 2024
1 parent 5fffa58 commit 6721e72
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,19 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
## [Next Version]

### New Features and Optimizations
- Added context parallel support for SFT. CP can be enabled by setting `model.context_parallel_size` in your config.
- Added context parallel (CP) support for SFT. CP requires you to prepare your dataset using NeMo's [prepare_packed_ft_dataset.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/nlp_language_modeling/prepare_packed_ft_dataset.py) script prior to training. Be sure to pass the context parallel size to this script, for example:

```
python scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \
model.data.train_ds.file_names=[/path/to/training.jsonl] \
model.data.train_ds.max_seq_length=2048 \
+tokenizer_path=/path/to/tokenizer \
+output_dir=/path/to/output_folder \
+pack_sizes=[2048,4096,8192] \
model.context_parallel_size=2
```
CP can then be enabled in your training run by setting `model.context_parallel_size` in your config. Refer to the [SFT documentation](https://github.com/NVIDIA/NeMo-Aligner/blob/main/docs/user-guide/sft.rst#step-1-format-the-data)
for more details on running `prepare_packed_ft_dataset.py` and on running SFT with a packed dataset.
- Sequence packing is now supported when running DPO.
- Added support for Knowledge Distillation with SFT. See the [tutorial](docs/user-guide/knowledge-distillation.rst) for details.
- Added support for Megatron Core’s distributed optimizer, which can be configured using `++model.optim.name=mcore_distributed_optim`.
Expand Down

0 comments on commit 6721e72

Please sign in to comment.