-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference problem #9
Comments
Maybe somebody end up here with a problem - on inference_client.py i was missing this to launch it on linux Ubuntu |
same problem!! |
@wpq3142 did the libportaudio2 fix work for you? I've added it to the readme |
Sorry i mixes two things in one issue. |
how much vRam memory requires? |
with our current bfloat16 implementation, 24GB. |
Will there be a quantized or optimized build in the upcoming future? |
Hi! Good job one the model.
But i have trouble testing it.
Setup RTX4090 + 64gb ram (on loading models im kissing 63.9gb :) )
Tested on Windows - can't launch with default code cause of missing support for FLASH_ATTENTION
Exchanged for EFFICIENT_ATTENTION and i hear initial prompt with "bob how's it going bob" and than silence
Unfortunately same on Linux (no errors setup). Only initial prompt and nothing more. Silence :(
Torch installed with this command:
pip3 install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
Any tips how to go further with this?
The text was updated successfully, but these errors were encountered: