Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DDP Example for CIFAR10 by Using Epochs Only #147

Open
ImahnShekhzadeh opened this issue Nov 17, 2024 · 2 comments
Open

Fix DDP Example for CIFAR10 by Using Epochs Only #147

ImahnShekhzadeh opened this issue Nov 17, 2024 · 2 comments

Comments

@ImahnShekhzadeh
Copy link
Contributor

ImahnShekhzadeh commented Nov 17, 2024

Hi!

As recently discussed in #145 and #144 with @Xiaoming-Zhao (and I as had already mentioned in #116 (comment)), I/we believe that it would be best to use epochs instead of steps in the distributed mode, i.e. in the training loop of train_cifar10_ddp.py. Is there a reason not to (except consistency to the other scripts)? 🙃

I would be happy to make the changes, test them extensively and then open a PR.

Cheers,
Imahn

@atong01
Copy link
Owner

atong01 commented Nov 17, 2024

Yes I agree with this change in principle. Happy to take a PR in this direction. Thanks for all your work on this example!

@kilianFatras
Copy link
Collaborator

Of course. I might do it if I have time but it is rather uncertain for the end of the year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants