-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: Ensure Device Compatibility for BOFT Forward/Merging #2242
Conversation
See: huggingface/diffusers#9510 (comment) Right now, the low_cpu_mem_usage=True option does not consolidate the devices. E.g. when the model is on GPU and the state_dict on CPU, the adapter weight will be on CPU after loading, when it should be GPU. This fix ensures that the devices are consolidated.
Solves the following bug: huggingface/diffusers#9622 (comment) The cause for the bug is as follows: When we have, say, a module called "bar.0.query" that we want to target and another module called "foo_bar.0.query" that we don't want to target, there was potential for an error. This is not caused by _find_minimal_target_modules directly, but rather the bug was inside of BaseTuner.inject_adapter and how the names_no_target were chosen. Those used to be chosen based on suffix. In our example, however, "bar.0.query" is a suffix of "foo_bar.0.query", therefore "foo_bar.0.query" was *not* added to names_no_target when it should have. As a consequence, during the optimization, it looks like "query" is safe to use as target_modules because we don't see that it wrongly matches "foo_bar.0.query".
Thanks a lot for opening this PR to fix the issue with the BOFT tests. It appears that your fork was outdated when opening the PR, as it contains unrelated changes and merge conflicts. Could you please try merging with/rebasing on the current main branch? If this creates new conflicts, opening a new PR from an up-to-date fork might be the easier solution. |
Just merged |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@BenjaminBossan Can you please run the failed test again? There were too many requests: |
@d-kleine Not sure what's going on with the CI right now, I'll rerun it next week, hopefully it solves itself. |
Alright, thanks! |
I just have also fixed the As the GitHub Actions workflows seem to run only on CPU, please ensure when reviewing to run the tests on a CUDA-enabled device and also empty the cache to see if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! 👍🏻 In case you need further changes, please feel free to ping me 🙂 |
Okay, so we haven't heard back in a week, I'll just go ahead and merge this now. Again, thanks for this fix @d-kleine. |
Thanks for merging! 🙂 |
Fixes #2219
Description
This pull request resolves above issue regarding BOFT forward/merging with CUDA by ensuring that all relevant tensors and models are moved to the correct device. This change is necessary to prevent issues such as zero matrices and test failures when using CUDA.
Changes
torch_device = infer_device()
to infer and set the appropriate device (CPU or CUDA) at the beginning of theMultipleActiveAdaptersTester
class.prepare_inputs_for_testing()
to move input tensors to the inferred device using.to(self.torch_device)
.test_merge_layers_multi
to move models to the inferred device with.to(self.torch_device)
.test_multiple_active_adapters_merge_and_unmerge
the absolute tolerance had to be decreased fromatol=1e-5
toatol=1e-4
Tensor.data_ptr<scalar_t>()
instead of deprecatedTensor.data<scalar_t>()
for improved compatibility with PyTorch standardsTesting
tests/test_custom_models.py
pass successfully with these changes, on both Linux (Ubuntu) and Windows