Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error due to missing is_zero arg when saving LR scheduler #252

Open
Lauler opened this issue Nov 27, 2024 · 2 comments
Open

Error due to missing is_zero arg when saving LR scheduler #252

Lauler opened this issue Nov 27, 2024 · 2 comments

Comments

@Lauler
Copy link

Lauler commented Nov 27, 2024

This recent commit 51bd072 and pull request #230 which changes how LR schedulers are saved forgot to add the is_zero arg and set it to config.optimizer.zero_stage in src/nanotron/serialize/main.py:

save_lr_scheduler(
lr_scheduler=lr_scheduler,
parallel_context=parallel_context,
root_folder=root_folder,
)

This arg is expected to be passed:

def save_lr_scheduler(
lr_scheduler,
is_zero,
parallel_context: ParallelContext,
root_folder: Path,
):

This causes training to crash when LR scheduler is saved.

@TJ-Solergibert @NouamaneTazi

@sankexin
Copy link

sankexin commented Nov 28, 2024

nanotron/src/nanotron/serialize/main.py

save_lr_scheduler(
    lr_scheduler=lr_scheduler,
    is_zero=True,
    parallel_context=parallel_context,
    root_folder=root_folder,
)

@Lauler
Copy link
Author

Lauler commented Nov 28, 2024

I changed mine to

save_lr_scheduler( 
     lr_scheduler=lr_scheduler,
     is_zero=config.optimizer.zero_stage,
     parallel_context=parallel_context, 
     root_folder=root_folder, 
 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants