-
Notifications
You must be signed in to change notification settings - Fork 677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
请问可以多卡推理吗,单卡显存有限 #587
Labels
question
Further information is requested
Comments
你好,可以回答一下吗 |
当前不支持多卡推理 |
好的谢谢,那提供训练后量化推理代码吗 |
ok |
报错: RuntimeError: expected scalar type Float but found BFloat16 |
--bits 8 |
--bits 8有时会出现上面的报错,建议使用--bits 4 |
运行时-bits 4为什么速度慢--bits 16 两倍多呢,bits 16一条2s,bits 4一条反而要5s。正常量化不应该变快些吗,疑惑不解,求大佬解惑 |
不清楚哪里出了什么问题 |
好吧,您在使用量化模型时不会变慢吗 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
请问显存不够,只有4090D,bits设置16,如果想用多卡进行微调后的OneKe推理,有提供单机多卡推理代码吗?需要用什么命令执行?
单卡推理显存不够,报OOM了:torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 23.65 GiB total capacity; 23.21 GiB already allocated; 33.88 MiB free; 23.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
期待您回复。
The text was updated successfully, but these errors were encountered: