Skip to content

bloc97/DeMo

Repository files navigation

DeMo

This package contains the supplementary material for DeMo: Decoupled Momentum Optimization (arXiv)

A standalone PyTorch optimizer is provided in demo.py.

To reproduce the experiments in the paper, apply 0001-DeMo.patch to https://github.com/allenai/OLMo/commit/46f06cbc3b42ed94a2400dec4aa479197d1ba0b6. To launch the training jobs run torchrun --nodes=8 --nproc-per-node=8 scripts/train.py CONFIG_FILE where CONFIG_FILE is any of the .yaml files provided in this package.

For implementation in other PyTorch training pipelines, the standalone DeMo optimizer can be used as-is, the only additional modification needed is to disable the native Distributed Data Parallel gradient synchronization/all-reduce.

Future updates will be on the DisTrO repo.

About

DeMo: Decoupled Momentum Optimization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages