-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSeek-Coder-V2-Lite stuck #3212
Labels
Comments
Hi - can you share the output of |
Sure! Inside the container looks empty: # docker exec -it tabby-tabby-1 nvidia-smi
Sat Sep 28 09:35:34 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 On | 00000000:07:00.0 Off | N/A |
| 56% 58C P2 129W / 280W | 14269MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+ Outside there are other unrelated stuff in the 2060 and the tabby processes in the 3090: Sat Sep 28 11:36:14 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2060 On | 00000000:06:00.0 Off | N/A |
| 34% 46C P2 37W / 128W | 1819MiB / 6144MiB | 5% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 On | 00000000:07:00.0 Off | N/A |
| 56% 58C P2 129W / 280W | 14556MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 144128 C frigate.detector.tensorrt 246MiB |
| 0 N/A N/A 144560 C ffmpeg 182MiB |
| 0 N/A N/A 1832724 C ffmpeg 182MiB |
| 0 N/A N/A 1900156 C ffmpeg 156MiB |
| 0 N/A N/A 1900946 C ffmpeg 156MiB |
| 0 N/A N/A 2774390 C ffmpeg 182MiB |
| 0 N/A N/A 2807270 C ffmpeg 182MiB |
| 0 N/A N/A 3208761 C ffmpeg 182MiB |
| 0 N/A N/A 3873962 C ffmpeg 172MiB |
| 0 N/A N/A 4073311 C ffmpeg 172MiB |
| 1 N/A N/A 154641 C /opt/tabby/bin/llama-server 284MiB |
| 1 N/A N/A 2544258 C ...unners/cuda_v12/ollama_llama_server 13442MiB |
| 1 N/A N/A 4082688 C /opt/tabby/bin/llama-server 818MiB |
+---------------------------------------------------------------------------------------+ In the container logs with debugging enabled:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
DeepSeek-Coder-V2-Lite is stuck at starting.
The web server does not even satrt after 10,000 s of “Starting“. Other models work fine. If within the container I kill the llama-server process and start it manually without --disable-log, it can work and the web server starts and provides completions.
If I then kill the manually started ollama-server process, the default one is able to spawn and be loaded to VRAM but it just doesn't reply to requests.
Information about your version
Happens with 0.16.1, 0.17.0 and 0.18.0-rc4.
Information about your GPU
NVIDIA GeForce RTX 3090
Additional context
Running a fresh instance with docker compose:
There's nothing relevant in the logs, just request logs but no errors.
The text was updated successfully, but these errors were encountered: