You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have run the tests on 4 MPI tasks (as hardcoded in the tests scripts), and they all pass flawlessly, except for the DQMC/multislater_ghf_gi one, for which I get:
...running DQMC/multislater_ghf_gi
DQMC: ./eigen/Eigen/src/Core/Block.h:146: Eigen::Block<XprType, BlockRows, BlockCols, InnerPanel>::Block(XprType&, Eigen::Index, Eigen::Index, Eigen::Index, Eigen::Index) [with XprType = Eigen::Map<Eigen::Matrix<double, -1, -1>, 0, Eigen::Stride<0, 0>>; int BlockRows = -1; int BlockCols = -1; bool InnerPanel =false; Eigen::Index = long int]: Assertion `startRow >= 0 && blockRows >= 0 && startRow <= xpr.rows() - blockRows && startCol >= 0 && blockCols >= 0 && startCol <= xpr.cols() - blockCols' failed.[std-hb2-pg0-9:432066] *** Process received signal ***[std-hb2-pg0-9:432066] Signal: Aborted (6)[std-hb2-pg0-9:432066] Signal code: (-6)[std-hb2-pg0-9:432066] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7ff8d06d6520][std-hb2-pg0-9:432066] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7ff8d072a9fc][std-hb2-pg0-9:432066] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7ff8d06d6476][std-hb2-pg0-9:432066] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7ff8d06bc7f3][std-hb2-pg0-9:432066] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7ff8d06bc71b][std-hb2-pg0-9:432066] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7ff8d06cde96][std-hb2-pg0-9:432066] [ 6] DQMC(+0x15170e)[0x55dc226a270e][std-hb2-pg0-9:432066] [ 7] DQMC(+0x1e5e17)[0x55dc22736e17][std-hb2-pg0-9:432066] [ 8] DQMC(+0x2e548)[0x55dc2257f548][std-hb2-pg0-9:432066] [ 9] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7ff8d06bdd90][std-hb2-pg0-9:432066] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7ff8d06bde40][std-hb2-pg0-9:432066] [11] DQMC(+0x2eb85)[0x55dc2257fb85][std-hb2-pg0-9:432066] *** End of error message ***--------------------------------------------------------------------------Primary job terminated normally, but 1 process returneda non-zero exit code. Per user-direction, the job has been aborted.----------------------------------------------------------------------------------------------------------------------------------------------------mpirun noticed that process rank 0 with PID 0 on node DESKTOP-HSCRDM6 exited on signal 6 (Aborted).--------------------------------------------------------------------------
Different number of MPI tasks
When I tried to do the same on a different number of MPI tasks, for example 2, 6, 8, 16, most (or all) tests fail.
Often the energy difference with respect to the reference value can be of the order of 0.1 or 0.01, other times it is of the order of 0.001, so well above the set tolerances (1e-6 or 1e-7).
I have tried building Dice with GCC 13.2, GCC 11.4, and ICC 2021.10, and I always get these inconsistencies.
I am linking it with [email protected], [email protected], MKL 2023.2, and OpenMPI.
Have you ever seen this behavior and do you understand where these differences may come from?
Or is within the expected statistical fluctuations due to the stochastic nature of the method?
Thanks
The text was updated successfully, but these errors were encountered:
@xubo-wang should know about the ghf test, it has been failing for a bit I think.
About the number of tasks, this is because the convention in our code is to increase the number of samples with an increasing number of tasks. So the sampling input options are per task, and the tests only work with four tasks.
Default tests
I have run the tests on 4 MPI tasks (as hardcoded in the tests scripts), and they all pass flawlessly, except for the
DQMC/multislater_ghf_gi
one, for which I get:Different number of MPI tasks
When I tried to do the same on a different number of MPI tasks, for example 2, 6, 8, 16, most (or all) tests fail.
Often the energy difference with respect to the reference value can be of the order of 0.1 or 0.01, other times it is of the order of 0.001, so well above the set tolerances (1e-6 or 1e-7).
Example test output
I have tried building Dice with GCC 13.2, GCC 11.4, and ICC 2021.10, and I always get these inconsistencies.
I am linking it with [email protected], [email protected], MKL 2023.2, and OpenMPI.
Have you ever seen this behavior and do you understand where these differences may come from?
Or is within the expected statistical fluctuations due to the stochastic nature of the method?
Thanks
The text was updated successfully, but these errors were encountered: