FIX: Ensure Device Compatibility for BOFT Forward/Merging #2242

d-kleine · 2024-11-29T08:18:41Z

Description

This pull request resolves above issue regarding BOFT forward/merging with CUDA by ensuring that all relevant tensors and models are moved to the correct device. This change is necessary to prevent issues such as zero matrices and test failures when using CUDA.

Changes

Ensuring that all operations will be performed on the same device:
- Added torch_device = infer_device() to infer and set the appropriate device (CPU or CUDA) at the beginning of the MultipleActiveAdaptersTester class.
- Modified prepare_inputs_for_testing() to move input tensors to the inferred device using .to(self.torch_device).
- Updated model initialization in test_merge_layers_multi to move models to the inferred device with .to(self.torch_device).
Due to floating point precision on CUDA, for test_multiple_active_adapters_merge_and_unmerge the absolute tolerance had to be decreased from atol=1e-5 to atol=1e-4
Updated CUDA code to use Tensor.data_ptr<scalar_t>() instead of deprecated Tensor.data<scalar_t>() for improved compatibility with PyTorch standards

Testing

Verified that tests in tests/test_custom_models.py pass successfully with these changes, on both Linux (Ubuntu) and Windows
Ensured compatibility across different environments, including CPU and CUDA-enabled setups.

See: huggingface/diffusers#9510 (comment) Right now, the low_cpu_mem_usage=True option does not consolidate the devices. E.g. when the model is on GPU and the state_dict on CPU, the adapter weight will be on CPU after loading, when it should be GPU. This fix ensures that the devices are consolidated.

Solves the following bug: huggingface/diffusers#9622 (comment) The cause for the bug is as follows: When we have, say, a module called "bar.0.query" that we want to target and another module called "foo_bar.0.query" that we don't want to target, there was potential for an error. This is not caused by _find_minimal_target_modules directly, but rather the bug was inside of BaseTuner.inject_adapter and how the names_no_target were chosen. Those used to be chosen based on suffix. In our example, however, "bar.0.query" is a suffix of "foo_bar.0.query", therefore "foo_bar.0.query" was *not* added to names_no_target when it should have. As a consequence, during the optimization, it looks like "query" is safe to use as target_modules because we don't see that it wrongly matches "foo_bar.0.query".

BenjaminBossan · 2024-11-29T10:16:15Z

Thanks a lot for opening this PR to fix the issue with the BOFT tests. It appears that your fork was outdated when opening the PR, as it contains unrelated changes and merge conflicts. Could you please try merging with/rebasing on the current main branch? If this creates new conflicts, opening a new PR from an up-to-date fork might be the easier solution.

d-kleine · 2024-11-29T10:37:04Z

Just merged main into my PR's branch (I was expecting that as I was implementing the changes from v0.13.2; just waited for a confirmation to merge from your side)

HuggingFaceDocBuilderDev · 2024-11-29T11:42:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

d-kleine · 2024-11-29T14:28:49Z

@BenjaminBossan Can you please run the failed test again? There were too many requests: 429 Client Error: Too Many Requests for url: https://huggingface.co/api/models/hf-internal-testing/tiny-stable-diffusion-torch

BenjaminBossan · 2024-11-29T17:44:56Z

@d-kleine Not sure what's going on with the CI right now, I'll rerun it next week, hopefully it solves itself.

d-kleine · 2024-11-29T17:49:23Z

Alright, thanks!

d-kleine · 2024-12-02T09:01:46Z

I just have also fixed the fbd_cuda deprecation warning: #2219 (comment)

As the GitHub Actions workflows seem to run only on CPU, please ensure when reviewing to run the tests on a CUDA-enabled device and also empty the cache to see if the fbd_cuda warnings are gone now.

BenjaminBossan

Thanks so much for your work on this @d-kleine. I can confirm that this works for me locally on CUDA.

Pinging @Zeju1997 who can hopefully confirm that this fix is good. I'll wait ~2-3 days for a reply, if I don't hear back by then I'll merge the PR.

d-kleine · 2024-12-02T11:39:30Z

Great, thanks! 👍🏻 In case you need further changes, please feel free to ping me 🙂

BenjaminBossan · 2024-12-09T10:54:13Z

Okay, so we haven't heard back in a week, I'll just go ahead and merge this now. Again, thanks for this fix @d-kleine.

d-kleine · 2024-12-09T14:02:07Z

Thanks for merging! 🙂

BenjaminBossan and others added 5 commits October 8, 2024 14:16

Release 0.13.1 (patch release for #2113)

b8da272

Release 0.13.2 (patch release for #2144)

431c0e2

MultipleActiveAdaptersTester moved to device

b752b5c

d-kleine marked this pull request as ready for review November 29, 2024 08:19

Merge branch 'main' into boft_cuda

0d72188

d-kleine changed the title ~~Fix: Ensure Device Compatibility for BOFT Forward/Merging with CUDA~~ FIX: Ensure Device Compatibility for BOFT Forward/Merging with CUDA Nov 29, 2024

d-kleine changed the title ~~FIX: Ensure Device Compatibility for BOFT Forward/Merging with CUDA~~ FIX: Ensure Device Compatibility for BOFT Forward/Merging Nov 29, 2024

d-kleine marked this pull request as draft December 2, 2024 02:17

fixed fbd CUDA deprecation warning

2338fd1

d-kleine marked this pull request as ready for review December 2, 2024 09:01

Merge branch 'main' into boft_cuda

3e9684a

BenjaminBossan approved these changes Dec 2, 2024

View reviewed changes

BenjaminBossan merged commit ec92cdc into huggingface:main Dec 9, 2024
14 checks passed

d-kleine deleted the boft_cuda branch December 9, 2024 14:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Ensure Device Compatibility for BOFT Forward/Merging #2242

FIX: Ensure Device Compatibility for BOFT Forward/Merging #2242

d-kleine commented Nov 29, 2024 •

edited

Loading

BenjaminBossan commented Nov 29, 2024

d-kleine commented Nov 29, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 29, 2024

d-kleine commented Nov 29, 2024 •

edited

Loading

BenjaminBossan commented Nov 29, 2024

d-kleine commented Nov 29, 2024

d-kleine commented Dec 2, 2024

BenjaminBossan left a comment

d-kleine commented Dec 2, 2024 •

edited

Loading

BenjaminBossan commented Dec 9, 2024

d-kleine commented Dec 9, 2024

FIX: Ensure Device Compatibility for BOFT Forward/Merging #2242

FIX: Ensure Device Compatibility for BOFT Forward/Merging #2242

Conversation

d-kleine commented Nov 29, 2024 • edited Loading

Description

Changes

Testing

BenjaminBossan commented Nov 29, 2024

d-kleine commented Nov 29, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Nov 29, 2024

d-kleine commented Nov 29, 2024 • edited Loading

BenjaminBossan commented Nov 29, 2024

d-kleine commented Nov 29, 2024

d-kleine commented Dec 2, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

d-kleine commented Dec 2, 2024 • edited Loading

BenjaminBossan commented Dec 9, 2024

d-kleine commented Dec 9, 2024

d-kleine commented Nov 29, 2024 •

edited

Loading

d-kleine commented Nov 29, 2024 •

edited

Loading

d-kleine commented Nov 29, 2024 •

edited

Loading

d-kleine commented Dec 2, 2024 •

edited

Loading