运行cli_demo时抛异常,pytorch2.3安装了的,是什么原因? #1259
hotcolaava
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
CUDA extension not installed.
CUDA extension not installed.
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3.18it/s]
Traceback (most recent call last):
File "D:\project\Qwen\cli_demo.py", line 210, in
main()
File "D:\project\Qwen\cli_demo.py", line 116, in main
model, tokenizer, config = _load_model_tokenizer(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\project\Qwen\cli_demo.py", line 54, in _load_model_tokenizer
model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\transformers\models\auto\auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\transformers\modeling_utils.py", line 3928, in from_pretrained
model = quantizer.post_init_model(model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ProgramData\miniconda3\envs\qwen\Lib\site-packages\optimum\gptq\quantizer.py", line 588, in post_init_model
raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama or Exllamav2 backend requires all the modules to be on GPU.You can deactivate exllama backend by setting
disable_exllama=True
in the quantization config objectBeta Was this translation helpful? Give feedback.
All reactions