DeMo

This package contains the supplementary material for DeMo: Decoupled Momentum Optimization (arXiv)

A standalone PyTorch optimizer is provided in demo.py.

To reproduce the experiments in the paper, apply 0001-DeMo.patch to https://github.com/allenai/OLMo/commit/46f06cbc3b42ed94a2400dec4aa479197d1ba0b6. To launch the training jobs run torchrun --nodes=8 --nproc-per-node=8 scripts/train.py CONFIG_FILE where CONFIG_FILE is any of the .yaml files provided in this package.

For implementation in other PyTorch training pipelines, the standalone DeMo optimizer can be used as-is, the only additional modification needed is to disable the native Distributed Data Parallel gradient synchronization/all-reduce.

Future updates will be on the DisTrO repo.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
0001-DeMo.patch		0001-DeMo.patch
OLMo-1B-100BT-demo.yaml		OLMo-1B-100BT-demo.yaml
OLMo-1B-100BT-ref.yaml		OLMo-1B-100BT-ref.yaml
OLMo-300M-100BT-demo.yaml		OLMo-300M-100BT-demo.yaml
OLMo-300M-100BT-ref.yaml		OLMo-300M-100BT-ref.yaml
README.md		README.md
demo.py		demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeMo

About

Releases

Packages

Contributors 2

Languages

bloc97/DeMo

Folders and files

Latest commit

History

Repository files navigation

DeMo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages