Releases: NVIDIA/NeMo-Aligner
Releases · NVIDIA/NeMo-Aligner
NVIDIA NeMo-Aligner 0.6.0rc1.dev0
Prerelease: NVIDIA NeMo-Aligner 0.6.0rc1.dev0 (2024-12-20)'
v0.6.0rc0: fix: fix DPO sequence packing + pipeline parallel (#437)
Signed-off-by: ashors1 <[email protected]>
NVIDIA NeMo-Aligner 0.5.0
New Features and Optimizations
- Implement Kahneman-Tversky Optimization (KTO).
- Sequence packing is now supported when running SFT with SFTChatDataset.
Breaking Changes
Bug Fixes
- Change
log_prob_forward_micro_batch_size
in DPO to mean the same as themicro_batch_size
, which is how many samples(chosen and rejected included) that we process at once.
NVIDIA NeMo-Aligner 0.4.0
- Implement reward-aware preference optimization.
- Added TRT-LLM support in PPO. This can be enabled by doing
trainer.ppo.trt_llm.enable=True
. There is also a reshard option to reshard out pipeline parallelism during inference for further speedup viatrainer.ppo.trt_llm.reshard=True
. - PPO algorithm will now detect if the sample sequence is ended, and if so zero out the gradient of the samples that did not stop properly.
- Added critic warmup to the PPO with the flag trainer.ppo.critic_warmup_steps.
New Features and Optimizations
- Critic and Reward Model server refactored. Now the reward model will have a flag called
model.forward_micro_batch_size
which determines the micro batch size on which it runs inferences. This can be higher than the training micro batch size since during inference, we have less memory pressure. - In the critic and reward model server, it is now possible to specify
inference_micro_batch_size
as a list. This allows us to provide more information to PyTriton regarding the preferred batch sizes for inference. - It is no longer a requirement to specify
num_rollout_samples
to be a multiple ofinference_micro_batch_size * dp size
in PPO.
Breaking Changes
inference.micro_batch_size
is now renamed toinference.inference_micro_batch_size
when running reward model inference ininference_rm.yaml
. This is to stay consistent with the naming scheme of the PPO critic.- It is no longer possible to specify
add_EOS
when running reward model or critic inference. - NeMo-Aligner now requires Megatron-LM>=0.8.0 for the APIs to calculate the microbatch sizes.
Bug Fixes
- Make
num_workers
for dataloaders 0 by default. This prevents issues when using MPI (with TRT-LLM) or more sophisticated launchers.
NVIDIA NeMo-Aligner v0.3.1
- SPIN: added
rollout_micro_batch_size
parameter which allows users to set the batch size for doing generation during SPIN training.
previously the generation batch size was automatically set to the data parallel size (DP) of the model
New features and optimizations
- Add MoE Support for our reward models.
- SFT/SteerLM: LoRA can now be enabled on all model layers
- DPO: Enable LoRA on all model layers (In this case the actor will be reference model + LoRA weights, we can switch between actor/reference model by enabling/disabling LoRA)
- PPO: Enable LoRA on all model layers (In this case the actor will be init policy + LoRA weights, we can switch between actor/init_policy model by enabling/disabling LoRA)
Breaking changes
Bug Fixes
- Fixed issue where random sampler keeps state when resetting for validation, leading to a different validation batch each validation step. Fixed by using a deterministic sampler
- Fixed crash with float val check interval in DPOTrainer
- Fixed crash with float val check interval when checking progress in DPOTrainer
- Fixed potential crash in SPIN when prompts are longer than encoder_seq_len - generation.max_length
- Fixed crash when calling the
generate()
method of an SFT model with pipeline parallelism greater than two - Fixed crash when calling the
generate()
method of an SFT model withcompute_logprob=True
and string inputs - Fixed crash when
model.micro_batch_size
> 1 in DPO - Fixed issue when
model.encoder_seq_length
is mismatched withmodel.data.train_ds.max_seq_length
in SFT and SPIN. - Delete MegatronPretrainingRandomSampler from Aligner since it has been upstreamed into NeMo
Container
docker pull nvcr.io/nvidia/nemo:24.05
To get access:
- Sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be prompted to sign up for an account before proceeding.
- If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team:
ea-bignlp/ga-participants
and click Generate API key. Save this key for the next step. Else, skip this step. - On your machine, docker login to nvcr.io using
docker login nvcr.io
Username: $oauthtoken
Password: <Your Saved NGC API Key>
PyPi
NVIDIA NeMo-Aligner v0.2.0
New features and optimizations
- Added public-facing official Dockerfile for NeMo-Aligner.
- PPO: memory optimization to help avoid OOM in the actor when sending training data to the critic.
- PPO: it is now possible to use a custom end string in
sampling_params.end_strings
that is different from<extra_id_1>
. - SFT: added support for custom validation metrics based on model generations.
- Added the ability to do multi-epoch (cfg.max_epochs > 1) training for reward models, DPO, PPO, and SFT
- SFT/SteerLM: added LoRA tuning as an option besides full fine-tuning, only attention_qkv layer is supported
Breaking changes
- We have changed the shuffle logic in the data sampler to support multi-epoch training, so training runs using identical parameters
will not give the same results anymore because the shuffle logic has changed (specifically the seed value is modified slightly per epoch).
If you run CI/regression type tests, then be warned that the test may break due to this shuffle change.
Bug Fixes
- Fixed a potential issue when the base model's
model.data.data_prefix
config is a list and is about to be overridden with
a dictionary from the training configuration. exp_manager.max_time_per_run
is now respected, the trainers will save and run validation before exiting if we've reached the time limit.- Fixed crash in PPO when using a separate reward model server (i.e., with
combine_rm_and_critic_server=False
). - Fixed crash when LR scheduler is not specified
Container
docker pull nvcr.io/nvidia/nemo:24.01.framework
To get access:
- Sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be prompted to sign up for an account before proceeding.
- If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team:
ea-bignlp/ga-participants
and click Generate API key. Save this key for the next step. Else, skip this step. - On your machine, docker login to nvcr.io using
docker login nvcr.io
Username: $oauthtoken
Password: <Your Saved NGC API Key>
PyPi
NVIDIA NeMo-Aligner v0.1.0
Highlights
First open source release of NeMo-Aligner. Featuring:
- Support for the full Reinforcement Learning from Human Feedback(RLHF) pipeline including SFT, Reward Model Training and Reinforcement Learning
- Support for the SteerLM technique
- Support for Direct Preference Optimization
- Support for all Megatron Core GPT models such as LLAMA2 70B
Container
docker pull nvcr.io/ea-bignlp/ga-participants/nemofw-training:23.11
To get access:
- Sign up to get free and immediate access to NVIDIA NeMo Framework container. If you don’t have an NVIDIA NGC account, you will be prompted to sign up for an account before proceeding.
- If you don’t have an NVIDIA NGC API key, sign into NVIDIA NGC, selecting organization/team:
ea-bignlp/ga-participants
and click Generate API key. Save this key for the next step. Else, skip this step. - On your machine, docker login to nvcr.io using
docker login nvcr.io
Username: $oauthtoken
Password: <Your Saved NGC API Key>