You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to launch several webui instances with replacer in it to somehow bypass issues with multiple GPU support. I am planning to create reverse proxy that will automatically forward request to free instance. I have 8 GPUs - RTX4090, I am renting them from vast.ai.
Everything works fine on one instance, but when I try to run several instance, on every instance except first one, I have this issue:
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
I have 24GB of VRAM for each GPU and it can't even pass 10GB mark, so how it's possible to be OOM?..
I think it's connected with segment anything extension. It uses 3 different models which are not in sd-webui. Maybe they're moved incorrectly for multy GPU systems. Ask about it there, but I think in your case you need to explore the code by yourself
Also try different Sam models, they have different code. Maybe one of them will work
I am trying to launch several webui instances with replacer in it to somehow bypass issues with multiple GPU support. I am planning to create reverse proxy that will automatically forward request to free instance. I have 8 GPUs - RTX4090, I am renting them from vast.ai.
Everything works fine on one instance, but when I try to run several instance, on every instance except first one, I have this issue:
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
I have 24GB of VRAM for each GPU and it can't even pass 10GB mark, so how it's possible to be OOM?..
Uploading nvidia-smi output and log
out.log
The text was updated successfully, but these errors were encountered: