-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure when using more than 1 GPU in STRUMPACK MPI #126
Comments
The The OMP deprecation message is probably coming from the SLATE library. I believe the invalid resource handle message is because multiple mpi processes are using the same GPU, and so it is using more CUDA streams than allowed per GPU. |
This changes the |
When you run with |
Hmm, I'm not sure. STRUMPACK/src/dense/CUDAWrapper.cpp Line 330 in 115b152
this is called form the SparseSolver constructor. So perhaps that changes what you specify. But it should not use all GPUs. Maybe SLATE is doing that? You could try to set the |
Hi, Dr. Ghysels, Thank you for your reply! The previous failure when using muti-GPU STRUMPACK with slate has been avoided by setting OMP_NUM_THREADS= 1 as an enviroment input when calling mpirun. I also used CUDA_VISIBLE_DEVICES to limit the GPU resource. Now, the STRUMPACK only has MPI tasks assign on these Devices. eg: mpi -n 2, then two mpi tasks both use the two GPUs, 0 and 1. Following are a couple of questions regarding performance:
Thank you very much! Looking forward to your reply. Best, |
Are you using MPI_Init_thread with MPI_THREAD_MULTIPLE ? This is required for SLATE. MAGMA is still used for the factorization in the multi-GPU setting, but only for the local subtrees, while other parts of the code use SLATE. Indeed scaling with multiple GPUs is not very good. The problem really needs to be big enough. You can try running with --sp_enable_METIS_NodeNDP (see also #127). This can lead to better performance, and better scaling. There is not much to be done about the data movement for now. |
Hi, Dr. Ghysels,
I have seen some issues when using multi-GPU feature of STRUMPACK to solve a sparse matrix. I built STRUMPACK successfully with support of SLATE and MAGMA.
However, it passes when I run with one GPU: "
OMP_NUM_THREADS=1 mpirun -n 1 test_structure_reuse_mpi pde900.mtx
Example: I try using 2 GPUs:
a) sometimes it passes
(Why GPU =1 here? Does it mean, it only use one GPU but two processes are run on each og gpus I request? )
b) sometimes it fails with error msg
Do you know what the reasons could be, causing these issues and how should I resolve them?
Best,
-Jing
The text was updated successfully, but these errors were encountered: