From 6721e724154053d0ff2f46499caa57bc3cfc6fac Mon Sep 17 00:00:00 2001 From: Anna Shors Date: Thu, 12 Dec 2024 11:52:01 -0800 Subject: [PATCH] docs: add more details on CP + SFT support (#447) Signed-off-by: ashors1 Signed-off-by: Terry Kong --- CHANGELOG.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 93701d298..8a2a6eb54 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,7 +18,19 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) ## [Next Version] ### New Features and Optimizations -- Added context parallel support for SFT. CP can be enabled by setting `model.context_parallel_size` in your config. +- Added context parallel (CP) support for SFT. CP requires you to prepare your dataset using NeMo's [prepare_packed_ft_dataset.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/nlp_language_modeling/prepare_packed_ft_dataset.py) script prior to training. Be sure to pass the context parallel size to this script, for example: + + ``` + python scripts/nlp_language_modeling/prepare_packed_ft_dataset.py \ + model.data.train_ds.file_names=[/path/to/training.jsonl] \ + model.data.train_ds.max_seq_length=2048 \ + +tokenizer_path=/path/to/tokenizer \ + +output_dir=/path/to/output_folder \ + +pack_sizes=[2048,4096,8192] \ + model.context_parallel_size=2 + ``` + CP can then be enabled in your training run by setting `model.context_parallel_size` in your config. Refer to the [SFT documentation](https://github.com/NVIDIA/NeMo-Aligner/blob/main/docs/user-guide/sft.rst#step-1-format-the-data) +for more details on running `prepare_packed_ft_dataset.py` and on running SFT with a packed dataset. - Sequence packing is now supported when running DPO. - Added support for Knowledge Distillation with SFT. See the [tutorial](docs/user-guide/knowledge-distillation.rst) for details. - Added support for Megatron Core’s distributed optimizer, which can be configured using `++model.optim.name=mcore_distributed_optim`.