Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests fail on macbook with incomplete MPS support #154

Open
talonchandler opened this issue Feb 27, 2024 · 6 comments
Open

Tests fail on macbook with incomplete MPS support #154

talonchandler opened this issue Feb 27, 2024 · 6 comments
Labels
GPU Accelerated compute devices

Comments

@talonchandler
Copy link
Collaborator

The first set of tests test_correction.py fail when I run them locally with:

E       NotImplementedError: The operator 'aten::linalg_lstsq.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Seems like imperfect support for macbook gpus in torch. Related #150.

@talonchandler talonchandler added this to the waveOrder v2 milestones milestone Feb 27, 2024
@talonchandler
Copy link
Collaborator Author

I just confirmed that export PYTORCH_ENABLE_MPS_FALLBACK=1 fixes the correction tests, but one of the mps tests in stokes now fails:

FAILED tests/test_stokes.py::test_copying[mps] - torch._C._LinAlgError: linalg.inv: (Batch element 0): The diagonal element 4 is zero, the inversion could not be completed becaus...

@ziw-liu what do you think is best here? If we're still supporting macbooks then we should add macOS to the tests.

@ziw-liu
Copy link
Contributor

ziw-liu commented Feb 27, 2024

I think we should only officially support x86_64 CPUs and NVIDIA GPUs. MPS backend in torch is experimental and is expected to break (thus the test is disabled in CI). That being said the partial GPU support is not documented and maybe shouldn't be until we reach feature parity for the deconvolution models.

@ziw-liu
Copy link
Contributor

ziw-liu commented Feb 27, 2024

GH do have M1 CI runners now. So when MPS becomes mature we can enable the tests again.

@ziw-liu ziw-liu added the GPU Accelerated compute devices label Feb 27, 2024
@ziw-liu
Copy link
Contributor

ziw-liu commented Feb 27, 2024

But with PYTORCH_ENABLE_MPS_FALLBACK=1 I can get all tests to pass (on main). @talonchandler can you make a new environment and test again.

@talonchandler
Copy link
Collaborator Author

Tried with a new environment and the tests pass. Thanks for testing @ziw-liu!

I've just opened #156 as an interim documentation fix.

I think we should only officially support x86_64 CPUs and NVIDIA GPUs. MPS backend in torch is experimental and is expected to break (thus the test is disabled in CI).

Sounds good to me. Should we leave this issue open until we get full MPS support and complete the GPU support?

@talonchandler talonchandler removed this from the waveOrder v2 milestones milestone Mar 4, 2024
@ziw-liu
Copy link
Contributor

ziw-liu commented Jun 20, 2024

GH do have M1 CI runners now. So when MPS becomes mature we can enable the tests again.

Actually MPS is not available in VMs due to Apple being Apple so CI won't be feasible...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU Accelerated compute devices
Projects
None yet
Development

No branches or pull requests

2 participants