Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: RuntimeError: CUDA error: an illegal memory access was encountered #201

Open
5 tasks done
mike2505 opened this issue Mar 28, 2024 · 3 comments
Open
5 tasks done

Comments

@mike2505
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Have you updated WebUI and this extension to the latest version?

  • I have updated WebUI and this extension to the latest version

Do you understand that you should read the 1st item of https://github.com/continue-revolution/sd-webui-segment-anything#faq if you cannot install GroundingDINO?

  • My problem is not about installing GroundingDINO

Do you understand that you should use the latest ControlNet extension and enable external control if you want SAM extension to control ControlNet?

  • I have updated ControlNet extension and enabled "Allow other script to control this extension"

Do you understand that you should read the 2nd item of https://github.com/continue-revolution/sd-webui-segment-anything#faq if you observe problems like AttributeError bool object has no attribute enabled and TypeError bool object is not subscriptable?

  • My problem is not about such issue, otherwise I have tried changing the extension directory name from sd-webui-segment-anything to a1111-sd-webui-segment-anything

What happened?

I am trying to launch several webui instances with replacer in it to somehow bypass issues with multiple GPU support. I am planning to create reverse proxy that will automatically forward request to free instance. I have 8 GPUs - RTX4090, I am renting them from vast.ai.

Everything works fine on one instance, but when I try to run several instance, on every instance except first one, I have this issue:

torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I have 24GB of VRAM for each GPU and it can't even pass 10GB mark, so how it's possible to be OOM?..

Just tested with only one instance running with device-id=1. I still have the same issue, same goes with any device id except 0...

Uploading nvidia-smi output and log

image
out.log

Steps to reproduce the problem

  1. Install SD webui segment anything
  2. Run SD webui on different device id

What should have happened?

Ideally, there must not ab nssieu

Commit where the problem happens

webui: AUTOMATIC1111/stable-diffusion-webui@bef51ae
extension: 982138c

What browsers do you use to access the UI ?

No response

Command Line Arguments

--port 8081 --serer-name 127.0.0.1 --device-id=1

Console logs

torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Additional information

No response

@mike2505
Copy link
Author

Same goes with all models..

@light-and-ray
Copy link
Contributor

Try CUDA_VISIBLE_DEVICES env variable instead of --device-id=1

export CUDA_VISIBLE_DEVICES=1

@mike2505
Copy link
Author

That's strange, because it's working with that.. I assume segment anything has some issue w --device-id..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants