Skip to content

Commit

Permalink
added compile
Browse files Browse the repository at this point in the history
  • Loading branch information
huseinzol05 committed Nov 25, 2024
1 parent 65ce76f commit 26dfb98
Show file tree
Hide file tree
Showing 5 changed files with 672 additions and 85 deletions.
99 changes: 16 additions & 83 deletions session/translation/end-to-end/README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,23 @@
# HuggingFace T5

## T5
## how to primeintellect

1. Run prepare dataset, [prepare-data.ipynb](prepare-data.ipynb).

2. Run training script,

Original script, https://github.com/huggingface/transformers/blob/v4.21.2/examples/pytorch/translation/run_translation.py

BASE model,
```bash
cd /workspace
apt update
apt install vim screen -y
pip3 install huggingface-hub wandb mosaicml-streaming evaluate
huggingface-cli login
wandb wandb login
pip3 install notebook==6.5.6
screen -dmS jupyter_session bash -c "jupyter notebook --NotebookApp.token='' --no-browser --allow-root --notebook-dir='/workspace'"
python -c "
from huggingface_hub import snapshot_download
snapshot_download(repo_id='mesolitica/malaysian-translation-v2-multipack-2048-post', repo_type='dataset', local_dir = './malaysian-translation-v2-multipack-2048-post')
"
pip3 install git+https://github.com/mesolitica/t5-sdpa-multipack
bash nanot5-small-multipack-post.sh
```
CUDA_VISIBLE_DEVICES='0' \
WANDB_DISABLED=true \
python3 run_t5.py \
--model_name_or_path mesolitica/t5-base-standard-bahasa-cased \
--num_train_epochs 5 \
--logging_steps 100 \
--eval_steps 10000 \
--save_steps 10000 \
--evaluation_strategy steps \
--save_total_limit 3 \
--do_train \
--do_eval \
--source_lang src \
--target_lang tgt \
--train_file shuffled-train.json \
--validation_file test.json \
--output_dir finetune-t5-base-standard-bahasa-cased \
--per_device_train_batch_size=8 \
--per_device_eval_batch_size=4 \
--predict_with_generate \
--max_source_length 1536 \
--max_target_length 1536 \
--learning_rate 2e-4 \
--gradient_checkpointing true
```

SMALL model,
```
WANDB_PROJECT="translation-t5-small" \
torchrun \
--nproc_per_node 2 \
-m run_t5 \
--model_name_or_path mesolitica/t5-small-standard-bahasa-cased \
--num_train_epochs 3 \
--eval_steps 1000000000 \
--logging_steps 10 \
--save_steps 2000 \
--save_total_limit 3 \
--do_train \
--train_file mosaic \
--output_dir finetune-t5-small-standard-bahasa-cased \
--per_device_train_batch_size=16 \
--per_device_eval_batch_size=4 \
--max_source_length 1536 \
--max_target_length 1536 \
--learning_rate 2e-4 \
--gradient_checkpointing true \
--bf16
```

TINY model,
```
WANDB_PROJECT="translation-t5-small" \
torchrun \
--nproc_per_node 2 \
-m run_t5 \
--model_name_or_path mesolitica/t5-tiny-standard-bahasa-cased \
--num_train_epochs 3 \
--eval_steps 1000000000 \
--logging_steps 10 \
--save_steps 2000 \
--save_total_limit 3 \
--do_train \
--train_file mosaic \
--output_dir finetune-t5-tiny-standard-bahasa-cased \
--per_device_train_batch_size=16 \
--per_device_eval_batch_size=4 \
--max_source_length 1536 \
--max_target_length 1536 \
--learning_rate 5e-5 \
--gradient_checkpointing true \
--bf16
```

## nanoT5

1. Run prepare dataset, [prepare-data-v2.ipynb](prepare-data-v2.ipynb).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ torchrun \
--save_steps 200 \
--save_total_limit 3 \
--do_train \
--train_file malaysian-translation-v2-multipack-2248-post \
--train_file malaysian-translation-v2-multipack-2048-post \
--output_dir nanot5-small-malaysian-cased-translation-v4-packing-post-v3 \
--dataloader_num_workers=5 \
--per_device_train_batch_size=1 \
Expand All @@ -23,4 +23,4 @@ torchrun \
--weight_decay 0.001 \
--bf16 \
--ddp_find_unused_parameters true \
--model_revision d3d615448b7153d849116cf028a1290f86d7f744
--model_revision 57cebd22c04e2c3ffe4697a1e9dbfa5839e8750c
28 changes: 28 additions & 0 deletions session/translation/end-to-end/nanot5-small-multipack-compile.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
WANDB_PROJECT="nanot5-small-malaysian-cased-translation-v6-multipack-post" \
torchrun \
--nproc_per_node 2 \
-m run_t5_multipack_compile \
--model_name_or_path mesolitica/nanot5-small-malaysian-translation-v2 \
--num_train_epochs 2 \
--eval_steps 1000000000 \
--logging_steps 2 \
--save_steps 200 \
--save_total_limit 3 \
--do_train \
--train_file /home/husein/ssd3/t5-sdpa-multipack/packing-post \
--output_dir nanot5-small-malaysian-cased-translation-v5-multipack-post \
--dataloader_num_workers 5 \
--dataloader_prefetch_factor 4 \
--per_device_train_batch_size=2 \
--per_device_eval_batch_size=3 \
--gradient_accumulation_steps=8 \
--max_source_length 2048 \
--max_target_length 2048 \
--learning_rate 2e-5 \
--max_grad_norm 1.0 \
--gradient_checkpointing false \
--weight_decay 0.001 \
--bf16 \
--ddp_find_unused_parameters true \
--model_revision d3d615448b7153d849116cf028a1290f86d7f744 \
--torch_compile
27 changes: 27 additions & 0 deletions session/translation/end-to-end/nanot5-small-multipack-post.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
WANDB_PROJECT="nanot5-small-malaysian-cased-translation-v6-multipack-post" \
torchrun \
--nproc_per_node 2 \
-m run_t5_multipack \
--model_name_or_path mesolitica/nanot5-small-malaysian-translation-v2 \
--num_train_epochs 2 \
--eval_steps 1000000000 \
--logging_steps 2 \
--save_steps 200 \
--save_total_limit 3 \
--do_train \
--train_file /home/husein/ssd3/t5-sdpa-multipack/packing-post \
--output_dir nanot5-small-malaysian-cased-translation-v5-multipack-post \
--dataloader_num_workers 5 \
--dataloader_prefetch_factor 4 \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=3 \
--gradient_accumulation_steps=6 \
--max_source_length 2048 \
--max_target_length 2048 \
--learning_rate 2e-5 \
--max_grad_norm 1.0 \
--gradient_checkpointing false \
--weight_decay 0.001 \
--bf16 \
--ddp_find_unused_parameters true \
--model_revision d3d615448b7153d849116cf028a1290f86d7f744
Loading

0 comments on commit 26dfb98

Please sign in to comment.