Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 847 Bytes

README.md

File metadata and controls

11 lines (7 loc) · 847 Bytes

DeMo

This package contains the supplementary material for DeMo: Decoupled Momentum Optimization (arXiv)

A standalone PyTorch optimizer is provided in demo.py.

To reproduce the experiments in the paper, apply 0001-DeMo.patch to https://github.com/allenai/OLMo/commit/46f06cbc3b42ed94a2400dec4aa479197d1ba0b6. To launch the training jobs run torchrun --nodes=8 --nproc-per-node=8 scripts/train.py CONFIG_FILE where CONFIG_FILE is any of the .yaml files provided in this package.

For implementation in other PyTorch training pipelines, the standalone DeMo optimizer can be used as-is, the only additional modification needed is to disable the native Distributed Data Parallel gradient synchronization/all-reduce.

Future updates will be on the DisTrO repo.