Skip to content

Commit

Permalink
remove github submodule
Browse files Browse the repository at this point in the history
  • Loading branch information
samsja committed Jul 17, 2024
1 parent d38ab8b commit 422769a
Show file tree
Hide file tree
Showing 6 changed files with 9 additions and 35 deletions.
5 changes: 1 addition & 4 deletions .github/workflows/push-docker-image.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,7 @@ jobs:
# Link to discussion: https://github.com/orgs/community/discussions/25678

- name: Checkout
uses: actions/checkout@v3
with:
submodules: true

uses: actions/checkout@v3
- name: Docker meta
id: meta
uses: crazy-max/ghaction-docker-meta@v2
Expand Down
2 changes: 0 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,6 @@ RUN echo "export PATH=\"/opt/conda/bin:/root/.cargo/bin:\$PATH\"" >> /root/.bash
# Install Python dependencies (The gradual copies help with caching)
WORKDIR open_diloco
RUN pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
COPY hivemind_source hivemind_source
RUN pip install --no-cache-dir ./hivemind_source
COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY requirements-dev.txt requirements-dev.txt
Expand Down
28 changes: 3 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,26 +30,14 @@ source .venv/bin/activate

Install python dependencies:
```bash
# Hivemind
cd hivemind_source
pip install .
cp build/lib/hivemind/proto/* hivemind/proto/.
pip install -e ".[all]"
cd ..
# Requirements
pip install -r requirements.txt
# Others
pip install --pre torchdata --index-url https://download.pytorch.org/whl/nightly/cpu
pip install -e ./pydantic_config
# OpenDiLoCo
pip install .
```

Optionally, you can install flash-attn to use Flash Attention 2.
This requires your system to have cuda compiler set up.
```
# (Optional) flash-attn
pip install flash-attn==2.5.8
pip install flash-attn>=2.5.8
```

## Docker container
Expand Down Expand Up @@ -305,20 +293,10 @@ We recommend using `bf16` to avoid scaling and desynchronization issues with hiv


# Debugging Issues
1. `hivemind` or `pydantic_config`
If you are having issues with `hivemind` or `pydantic_config`, the issue could be related to submodules.
You can clean and reinitialize the submodules from the root of the repository with the following commands:

```
git submodule deinit -f .
git clean -xdf
git submodule update --init --recursive
```
2. `RuntimeError: CUDA error: invalid device ordinal`
1. `RuntimeError: CUDA error: invalid device ordinal`
A possible culprit is that your `--nproc-per-node` argument for the torchrun launcher is set incorrectly.
Please set it to an integer less than equal to the number of gpus you have on your machine.

3. `torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate...`
2. `torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate...`
A possible culprit is that your `--per-device-train-batch-size` is too high.
Try a smaller value.
1 change: 0 additions & 1 deletion hivemind_source
Submodule hivemind_source deleted from ad080e
1 change: 0 additions & 1 deletion pydantic_config
Submodule pydantic_config deleted from 8e19e0
7 changes: 5 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
transformers~=4.40
datasets>=2.19.1
wandb==0.16.4
wandb>=0.16.4
cyclopts>=2.6.1
fsspec[gcs]>=2024.3.1
torch==2.3.1
torch>=2.3.1
hivemind @ git+https://github.com/learning-at-home/hivemind.git@213bff9
pydantic_config @ git+https://github.com/samsja/pydantic_config.git@8e19e05

0 comments on commit 422769a

Please sign in to comment.