Skip to content

Commit

Permalink
fix dependecies
Browse files Browse the repository at this point in the history
  • Loading branch information
Jackmin801 authored and samsja committed Jul 11, 2024
1 parent b459953 commit 5e23acd
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 3 deletions.
22 changes: 20 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,28 @@ source .venv/bin/activate

Install python dependencies:
```bash
# Hivemind
cd hivemind_source
pip install .
cp build/lib/hivemind/proto/* hivemind/proto/.
pip install -e ".[all]"
cd ..
# Requirements
pip install -r requirements.txt
# Others
pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
pip install -e ./pydantic_config
# OpenDiLoCo
pip install .
```

Optionally, you can install flash-attn to use Flash Attention 2.
This requires your system to have cuda compiler set up.
```
# (Optional) flash-attn
pip install flash-attn==2.5.8
```

## Docker container

If you prefer to run your experiments in a reproduceable container, you can use our pre-built docker image containing the repository and pre-installed dependencies.
Expand Down Expand Up @@ -86,13 +104,13 @@ The [multiaddress](https://github.com/multiformats/multiaddr) strings listed aft

## Stopping hivemind runs

The current implementation of hivemind doesnt handle Ctrl+C keyboard interrupt well. You can stop the runs using `pkill`:
The current implementation of hivemind doesn't handle Ctrl+C keyboard interrupt well. You can stop the runs using `pkill`:
```bash
pkill -f torchrun
```

## Resuming from checkpoint
To resume from checkpoint, you can pass the `--resume-from-checkpoint` argment to the training script. e.g.
To resume from checkpoint, you can pass the `--resume-from-checkpoint` argument to the training script. e.g.
```bash
torchrun --nproc_per_node=8 \
train_fsdp.py \
Expand Down
1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,3 @@ wandb==0.16.4
cyclopts>=2.6.1
fsspec[gcs]>=2024.3.1
torch==2.3.1
flash-attn

0 comments on commit 5e23acd

Please sign in to comment.