Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues encountered when using Uni-Core in Uni-Mol #34

Open
jerermyyoung opened this issue Aug 21, 2023 · 2 comments
Open

Issues encountered when using Uni-Core in Uni-Mol #34

jerermyyoung opened this issue Aug 21, 2023 · 2 comments

Comments

@jerermyyoung
Copy link

jerermyyoung commented Aug 21, 2023

I tried to run the fine-tuning script provided in Uni-Mol (pasted here for easy reference).

data_path="./molecular_property_prediction"  # replace to your data path
save_dir="./save_finetune"  # replace to your save path
n_gpu=4
MASTER_PORT=10086
dict_name="dict.txt"
weight_path="./weights/checkpoint.pt"  # replace to your ckpt path
task_name="qm9dft"  # molecular property prediction task name 
task_num=3
loss_func="finetune_smooth_mae"
lr=1e-4
batch_size=32
epoch=40
dropout=0
warmup=0.06
local_batch_size=32
only_polar=0
conf_size=11
seed=0

if [ "$task_name" == "qm7dft" ] || [ "$task_name" == "qm8dft" ] || [ "$task_name" == "qm9dft" ]; then
	metric="valid_agg_mae"
elif [ "$task_name" == "esol" ] || [ "$task_name" == "freesolv" ] || [ "$task_name" == "lipo" ]; then
    metric="valid_agg_rmse"
else 
    metric="valid_agg_auc"
fi

export NCCL_ASYNC_ERROR_HANDLING=1
export OMP_NUM_THREADS=1
update_freq=`expr $batch_size / $local_batch_size`
python -m torch.distributed.launch --nproc_per_node=$n_gpu --master_port=$MASTER_PORT $(which unicore-train) $data_path --task-name $task_name --user-dir ./unimol --train-subset train --valid-subset valid \
       --conf-size $conf_size \
       --num-workers 8 --ddp-backend=c10d \
       --dict-name $dict_name \
       --task mol_finetune --loss $loss_func --arch unimol_base  \
       --classification-head-name $task_name --num-classes $task_num \
       --optimizer adam --adam-betas "(0.9, 0.99)" --adam-eps 1e-6 --clip-norm 1.0 \
       --lr-scheduler polynomial_decay --lr $lr --warmup-ratio $warmup --max-epoch $epoch --batch-size $local_batch_size --pooler-dropout $dropout\
       --update-freq $update_freq --seed $seed \
       --fp16 --fp16-init-scale 4 --fp16-scale-window 256 \
       --log-interval 100 --log-format simple \
       --validate-interval 1 \
       --finetune-from-model $weight_path \
       --best-checkpoint-metric $metric --patience 20 \
       --save-dir $save_dir --only-polar $only_polar \
       --reg

# --reg, for regression task
# --maximize-best-checkpoint-metric, for classification task

However, I encountered the following error:

unicore-train: error: unrecognized arguments: --local-rank=0

and the argument --local-rank does not even appear in Uni-Core. I am using PyTorch 2.0, and the log also warns me that:

If your script expects `--local-rank` argument to be set, please change it to read from `os.environ['LOCAL_RANK']` instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions 

It confuses me whether it means Uni-Core does not support PyTorch 2.0 (which seems not likely), or is there another problem?

@guolinke
Copy link
Member

guolinke commented Sep 4, 2023

if you use pytorch 2.0, please make sure your version is not earlier than https://github.com/dptech-corp/Uni-Core/tree/91ebaa0a73ac7ef52b57e9e8f6ddf22e32eb3c2e

@wayyzt
Copy link

wayyzt commented Nov 28, 2024

open the file called options.py (probably dir : ~/miniconda3/envs/unicore/lib/python3.10/site-packages/unicore/options.py)

replace the '--local_rank' to '--local-rank' :
group.add_argument('--device-id', '--local-rank', default=0, type=int,
help='which GPU to use (usually configured automatically)')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants