请问可以多卡推理吗，单卡显存有限 #587

shenshaowei · 2024-09-24T02:47:54Z

请问显存不够，只有4090D，bits设置16，如果想用多卡进行微调后的OneKe推理，有提供单机多卡推理代码吗？需要用什么命令执行？

CUDA_VISIBLE_DEVICES=0,1 python src/inference.py \
    --stage sft \
    --model_name_or_path '/data/shensw/model/OneKE' \
    --checkpoint_dir '/data/shensw/DeepKE/example/llm/InstructKGC/lora/oneke_ner_cmeee_add/checkpoint-2810' \
    --model_name 'llama' \
    --template 'llama2_zh' \
    --do_predict \
    --input_file 'data/NER/ner_cmeee_add.json' \
    --output_file 'results/ner_cmeee_add.json' \
    --finetuning_type lora \
    --output_dir 'result' \
    --predict_with_generate \
    --cutoff_len 512 \
    --bf16 \
    --max_new_tokens 300 \
    --bits 16

单卡推理显存不够，报OOM了：torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 23.65 GiB total capacity; 23.21 GiB already allocated; 33.88 MiB free; 23.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
期待您回复。

The text was updated successfully, but these errors were encountered:

shenshaowei · 2024-09-24T08:42:34Z

你好，可以回答一下吗

guihonghao · 2024-09-24T08:44:20Z

当前不支持多卡推理

shenshaowei · 2024-09-24T09:44:17Z

好的谢谢，那提供训练后量化推理代码吗

guihonghao · 2024-09-24T09:46:34Z

推理代码参考 https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README_CN.md#-6%E9%A2%84%E6%B5%8B
量化设置参数--bits 4

shenshaowei · 2024-09-24T10:46:41Z

ok

shenshaowei · 2024-09-24T11:45:06Z

output_dir='lora/oneke-bio-8-add'
mkdir -p ${output_dir}
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 --master_port=1287 src/finetune.py \
    --do_train --do_eval \
    --overwrite_output_dir \
    --model_name_or_path '/data/shensw/model/OneKE' \
    --stage 'sft' \
    --model_name 'llama' \
    --template 'llama2_zh' \
    --train_file '/data/shensw/DeepKE/example/llm/InstructKGC/data/NER/bio8-data/bio_train.json' \
    --valid_file '/data/shensw/DeepKE/example/llm/InstructKGC/data/NER/bio8-data/bio_dev.json' \
    --output_dir=${output_dir} \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 8 \
    --gradient_accumulation_steps 4 \
    --preprocessing_num_workers 8 \
    --num_train_epochs 10 \
    --learning_rate 5e-5 \
    --max_grad_norm 0.5 \
    --optim "adamw_torch" \
    --max_source_length 300 \
    --cutoff_len 500 \
    --max_target_length 300 \
    --evaluation_strategy "epoch" \
    --save_strategy "epoch" \
    --save_total_limit 10 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --bf16 \
    --bits 8

训练完推理：

    CUDA_VISIBLE_DEVICES=0 python src/inference.py \
    --stage sft \
    --model_name_or_path '/data/shensw/model/OneKE' \
    --checkpoint_dir '/data/shensw/DeepKE/example/llm/InstructKGC/lora/oneke-bio-8-add/checkpoint-890' \
    --model_name 'llama' \
    --template 'llama2_zh' \
    --do_predict \
    --input_file '/data/shensw/DeepKE/example/llm/InstructKGC/data/NER/bio8-data/bio_test.json' \
    --output_file 'results/bio_test.json' \
    --finetuning_type lora \
    --output_dir 'result' \
    --predict_with_generate \
    --cutoff_len 512 \
    --bf16\
    --max_new_tokens 300 \
    --bits 8

报错： RuntimeError: expected scalar type Float but found BFloat16
请问这个是什么原因呢？
--bf16\改为--fp16\依旧报错

shenshaowei · 2024-09-24T13:02:30Z

--bits 8
改为
--bits 4后可以运行了，看了代码，小于8，不同bits：4/8是选择不同的量化方式，都是bitesandbytes，难道跟版本有关系？
还有运行时-bits 4为什么速度慢--bits 162倍多，疑惑不解，求大佬解惑

guihonghao · 2024-09-24T13:04:07Z

--bits 8有时会出现上面的报错，建议使用--bits 4

shenshaowei · 2024-09-24T13:06:50Z

运行时-bits 4为什么速度慢--bits 16 两倍多呢，bits 16一条2s，bits 4一条反而要5s。正常量化不应该变快些吗，疑惑不解，求大佬解惑

guihonghao · 2024-09-25T06:00:06Z

不清楚哪里出了什么问题

shenshaowei · 2024-09-25T07:23:00Z

好吧，您在使用量化模型时不会变慢吗

shenshaowei added the question Further information is requested label Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问可以多卡推理吗，单卡显存有限 #587

请问可以多卡推理吗，单卡显存有限 #587

shenshaowei commented Sep 24, 2024 •

edited

Loading

shenshaowei commented Sep 24, 2024 •

edited

Loading

guihonghao commented Sep 24, 2024

shenshaowei commented Sep 24, 2024

guihonghao commented Sep 24, 2024

shenshaowei commented Sep 24, 2024

shenshaowei commented Sep 24, 2024

shenshaowei commented Sep 24, 2024 •

edited

Loading

guihonghao commented Sep 24, 2024

shenshaowei commented Sep 24, 2024 •

edited

Loading

guihonghao commented Sep 25, 2024

shenshaowei commented Sep 25, 2024

请问可以多卡推理吗，单卡显存有限 #587

请问可以多卡推理吗，单卡显存有限 #587

Comments

shenshaowei commented Sep 24, 2024 • edited Loading

shenshaowei commented Sep 24, 2024 • edited Loading

guihonghao commented Sep 24, 2024

shenshaowei commented Sep 24, 2024

guihonghao commented Sep 24, 2024

shenshaowei commented Sep 24, 2024

shenshaowei commented Sep 24, 2024

shenshaowei commented Sep 24, 2024 • edited Loading

guihonghao commented Sep 24, 2024

shenshaowei commented Sep 24, 2024 • edited Loading

guihonghao commented Sep 25, 2024

shenshaowei commented Sep 25, 2024

shenshaowei commented Sep 24, 2024 •

edited

Loading

shenshaowei commented Sep 24, 2024 •

edited

Loading

shenshaowei commented Sep 24, 2024 •

edited

Loading

shenshaowei commented Sep 24, 2024 •

edited

Loading