Parallelization of MUMPS solver #50
Unanswered
GreivinAlfaro
asked this question in
Q&A
Replies: 1 comment 1 reply
-
@GreivinAlfaro As far as I can tell, your script isn’t really using MUMPS’s distributed computation capabilities. In order to use them, both A and b must be distributed across different nodes. If your problem is not set up that way, you can still benefit from MUMPS’s multicore features (one node / multiple cores). Those rely on a multithreaded BLAS library (such as MKL or OpenBLAS), and on OpenMP for non-BLAS operations (see Section 3.13 of the documentation). Did you set the |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm writing a program that attempts to solve the linear system Ax = b. Here A is a sparse-symmetric and large matrix and b is normally a vector of zeros except for one entry (set to one). Basically, I want to find the columns of the inverse A^{-1} by solving the linear system instead.
I want to do this for several vectors b (i.e. finding several columns of the inverse matrix A^{-1}). Due to the intention of our project, we avoid iterative solvers as the quantities of interest are very sensitive to numerical error.
I was trying to implement this using the MUMPS solver. I was able to solve the linear system with MUMPS (that has a sparse density around 7e-6) for a matrix of size 184,756 x 184,756 in ~220 s (for a single core with 2.9 GHz). I wanted to parallelize it but didn't manage to do it. Below is the function I created for this purpose.
I want to improve the speed of this code (making it faster than ~220 s) by parallelizing the function to be solved in different cores of the cluster. I'm simply sending the slurm script
This doesn't seem to be working, maybe due to a naive mistake as I'm new using the MPI.jl package.
As my intention is actually to solve the system for several vectors {b_i} I also attempted to keep MUMPS solver sequentially but distribute the different linear problems Ax_i = b_i among the several cores of the cluster. I did this by using the Distributed.jl package and distributing the for loop for every b_i. However, by doing this I experience a huge overhead.
By always keeping
ncores = number of different vectors b_i
, I get the expected time (around ~220s, even less) forncores <= 3
. However, if I setncores > 3
, solving thesencores
linear systems in parallel takes around 25 min! .I have no idea how to solve these problems, not for the parallelization of the linear system nor for the parallelization of the vectors {b_i}. If someone can help me solving any of these two alternatives I'll be extremely grateful!
Beta Was this translation helpful? Give feedback.
All reactions