Does Nemo-Aligner supports gradient accumulation (accumulate_grad_batches)? #451

shensimeteor · 2024-12-14T01:14:12Z

shensimeteor
Dec 14, 2024

My understanding about gradient accumulation is we can logically use a larger global batch size, but each time, we only need to load a micro batch size of training dataset.

From NeMo's doc, it doesn support that: https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/nlp/nemo_megatron/batching.html

Does NeMo Aligner also support it? If so, how to enable that?

Thanks!

odelalleau · 2024-12-14T01:21:16Z

odelalleau
Dec 14, 2024
Maintainer

Yes it's supported. It's implicitly used when global_batch_size > micro_batch_size * data_parallel_size.

0 replies

shensimeteor · 2024-12-14T01:47:32Z

shensimeteor
Dec 14, 2024
Author

Thanks for the quick response!

In my case, my global_batch_size = 128, micro_batch_size = 1, I didn't explicitly set data_parallel_size, but it should be 1 (according to world_size // (tensor_model_parallel_size * pipeline_model_parallel_size). So will NeMo Aligner automatically use accumulate_grad_batches = 128 in my case?

A related question is on dataloader: we use this build_dataloader (https://github.com/NVIDIA/NeMo-Aligner/blob/main/nemo_aligner/data/nlp/builders.py#L462C1-L462C22) with load_gbs=True. It seems in this case, the dataloader will return a batch of whole global batch -- my understanding is with gradient accumulation, we only need to load a micro batch instead. Is my understanding incorrect (ie. we still need to load whole global batch despite of gradient accumulation), or we should use load_gbs=False here?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Nemo-Aligner supports gradient accumulation (accumulate_grad_batches)? #451

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Does Nemo-Aligner supports gradient accumulation (accumulate_grad_batches)? #451

shensimeteor Dec 14, 2024

Replies: 2 comments

odelalleau Dec 14, 2024 Maintainer

shensimeteor Dec 14, 2024 Author

shensimeteor
Dec 14, 2024

odelalleau
Dec 14, 2024
Maintainer

shensimeteor
Dec 14, 2024
Author