-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unweighted Graphs Error with RBCs on ARCHER2 #815
Comments
This seems to be as if This may be an MPI implementation issue or us not using the creation interface correctly. It needs further investigation. |
A bit more digging shows that this (otherwise very sensible) check for weighted/unweighted only appeared when we moved from graph to distributed graph: 9689943 I don't see what we could be doing wrong in the distributed graph creation, so my suggestion would be to refine the logic of the check to assert that all weights are equal when the graph wrongly believes to be weighted (implementation issue?). Any thoughts @rupertnash ? |
Can you put the case in the shared folder ( I'd like to run under debugger to investigate as this may be a bug in the MPI library (standard is clear on what should happen "false if |
I have copied the |
Hi @rupertnash, I was trying to help @c-denham make a bit of progress on this issue by investigating whether we can use any alternative MPI implementation potentially available in ARCHER2 (e.g. Open MPI). Looking through I also tried swapping the default programming environment from gnu to |
So I have investigated and think this is likely to be a bug in the MPI library. I've reported to Helpdesk who've passed to HPE's MPICH team. They have reproduced the behaviour in HemeLB and are trying to understand the problem. I did not trigger the bug when running on a larger number of processors however, so maybe try that? Disabling the check is maybe OK, although if the communicators have been corrupted somehow (whether internally or by hemelb) then things may go wrong later... |
Thanks for investigating further, @rupertnash. We are gonna try running with a larger core count. How many did you got for? |
Hello,
I have ran a test case with RBCs on ARCHER2 and have copied the slurm.out below.
I do not have a copy of the output from when we compiled it @rupertnash last week but I recall the unweighted graphs error appearing during compiling but not during the fluid only test case.
Many thanks in advance for your advice.
The text was updated successfully, but these errors were encountered: