batch bicgstab with batch csr structure problem on gpu #1630
-
Hello, I am trying to solve linear systems with batch:: bicgstab and batch::csr for storage. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 10 replies
-
Replacing batch::bicgstab solver by batch::Cg fixes the execution problem. |
Beta Was this translation helpful? Give feedback.
-
It seems the issue indeed might be due to a limitation of V100. It seems to be work fine on a A100. This is Ginkgo 1.9.0 (develop)
running with core module 1.9.0 (develop)
the reference module is 1.9.0 (develop)
the OpenMP module is not compiled
the CUDA module is 1.9.0 (develop)
the HIP module is not compiled
the DPCPP module is not compiled
Residual norm sqrt(r^T r):
Exec: cuda
System no. 0: residual norm = 4.17165e-11, implicit residual norm = 4.17166e-11, iterations = 122
Solver type: batch::bicgstab
Matrix size: (4099, 4099)
Num batch entries: 1
Entire solve took: 0.0568908 seconds. Additionally, for these relatively large matrices, I think batched methods might not give any advantages, and will leave resources unused (as a run on only on thread block, so only 1024 threads will be used). It might be more beneficial to use the non-batched Bicgstab for these matrices. But in case batched CG works, that should be the better performing method, even if it might take more iterations. You can also do a performance comparison for this case and see if |
Beta Was this translation helpful? Give feedback.
It seems the issue indeed might be due to a limitation of V100. It seems to be work fine on a A100.
Additionally, for these relatively large matrices, I …