This is the repository for DisTrO (Distributed Training Over-The-Internet), a family of low latency distributed optimizers that reduce inter-GPU communication requirements by three to four orders of magnitude.
- Aug. 26th, 2024: DisTrO (Preliminary Report)
- Dec. 2nd, 2024: DeMo Optimization (Paper) (Code), original seed research/idea for DisTrO
- Dec. 2nd, 2024: Nous trains a 15b model using DisTrO
- Coming Soon: DisTrO Paper and Code
- In The Near Future: 👀
Join us on Discord if you're interested in helping research and build the future of distributed training.