RuntimeError: CUDA error: an illegal memory access was encountered #52

mike2505 · 2024-03-28T14:02:38Z

I am trying to launch several webui instances with replacer in it to somehow bypass issues with multiple GPU support. I am planning to create reverse proxy that will automatically forward request to free instance. I have 8 GPUs - RTX4090, I am renting them from vast.ai.

Everything works fine on one instance, but when I try to run several instance, on every instance except first one, I have this issue:

torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I have 24GB of VRAM for each GPU and it can't even pass 10GB mark, so how it's possible to be OOM?..

Uploading nvidia-smi output and log

out.log

mike2505 · 2024-03-28T14:10:23Z

Just tested with only one instance running with device-id=1. I still have the same issue, same goes with any device id except 0...

light-and-ray · 2024-03-28T14:14:45Z

I think it's connected with segment anything extension. It uses 3 different models which are not in sd-webui. Maybe they're moved incorrectly for multy GPU systems. Ask about it there, but I think in your case you need to explore the code by yourself

Also try different Sam models, they have different code. Maybe one of them will work

light-and-ray · 2024-03-28T14:51:53Z

If someone has the same problem, there's the answer: continue-revolution/sd-webui-segment-anything#201 (comment)

light-and-ray closed this as completed Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: an illegal memory access was encountered #52

RuntimeError: CUDA error: an illegal memory access was encountered #52

mike2505 commented Mar 28, 2024

mike2505 commented Mar 28, 2024

light-and-ray commented Mar 28, 2024

light-and-ray commented Mar 28, 2024

RuntimeError: CUDA error: an illegal memory access was encountered #52

RuntimeError: CUDA error: an illegal memory access was encountered #52

Comments

mike2505 commented Mar 28, 2024

mike2505 commented Mar 28, 2024

light-and-ray commented Mar 28, 2024

light-and-ray commented Mar 28, 2024