-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NestedDeviceMesh RFC #1
base: master
Are you sure you want to change the base?
Conversation
RFC-0040-nested-device-mesh.md
Outdated
To address the second limitation, we propose the process group creation API be rewritten to support dynamic world size by periodically polling the c10d store for the current world size and recreating the nccl communicators in the case of a change. | ||
|
||
## **Motivation** | ||
This proposal is motivated by the need to support dynamic mixed hardware in OpenDiLoCo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should frame it less towards open diloco and more towards decentralized training in heterogeneous setting. As this would be useful as well for swarm and other.
Ofc still citing open diloco
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea agree. throughout the doc, OpenDiLoCo is used to mean hierarchical sgd in heterogenous setting for lack of better terminology / wording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yess, haven't read too many other PyTorch RFCs as you guys but would assume we need a bit more general motivation. Replace OpenDiLoCo with "distributed training in heterogeneous setting" and add a bunch of citations of work in that direction (Maybe even include the multi-datacenter training post from semianalysis)
f6d9032
to
51cf337
Compare
51cf337
to
c878bf8
Compare
No description provided.