You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CI of #12848 failed even though no code was changed in that PR. Ubuntu-GCC-FullDebug seems to consistently fail in recent PRs so it's quite likely a memory corruption, but it might also be a very consistent race condition.
I remember that about half a year ago there was a similar issue with the MPM app, but on windows. Has anyone managed to track it down or is it possible that this is still the same issue?
Here are the logs from the failed run of #12848: logs.zip
If you don't know how to track down this bug, I recommend compiling kratos with sanitizers enabled, one at a time, and running the MPM test suite. You might need to run the suite several times. You can enable sanitizers by passing additional flags to CMAKE_CXX_FLAGS at cmake configuration time, then compiling. What flag you must pass depends on which sanitizer you want to compile with (e.g.: cmake ... -DCMAKE_CXX_FLAGS=-fsanitize=address).
first, try address sanitizer, which will give you information about where the faulty memory address and who tried to access it. -fsanitizer=address. You'll have to rerun the suite several times if the corruption originates from a race condition.
you can give memory sanitizer a shot too. -fsanitize=memory
if you didn't find the error with the previous two, try thread sanitizer -fsanitize=thread.
You can also try valgrind (has several modes, including helgrtind - a thread error detector), but in my experience it's difficult to find delicate race conditions with it and running with valgrind usually takes forever. The upshot is that you don't have to recompile kratos for this, just run the test suite through valgrind/helgrind.
If you're out of ideas, you can also try repeatedly running the test suite with gdb until something breaks. If something does break, gdb will pause the program at that point and you'll be able to query the callstack with bt. If the error really is a memory corruption due to a race condition, you might not run into the error where the bug is, but it might give you a hint on where the bug might be coming from.
The text was updated successfully, but these errors were encountered:
The CI of #12848 failed even though no code was changed in that PR. Ubuntu-GCC-FullDebug seems to consistently fail in recent PRs so it's quite likely a memory corruption, but it might also be a very consistent race condition.
I remember that about half a year ago there was a similar issue with the MPM app, but on windows. Has anyone managed to track it down or is it possible that this is still the same issue?
Here are the logs from the failed run of #12848: logs.zip
If you don't know how to track down this bug, I recommend compiling kratos with sanitizers enabled, one at a time, and running the MPM test suite. You might need to run the suite several times. You can enable sanitizers by passing additional flags to
CMAKE_CXX_FLAGS
at cmake configuration time, then compiling. What flag you must pass depends on which sanitizer you want to compile with (e.g.:cmake ... -DCMAKE_CXX_FLAGS=-fsanitize=address
).-fsanitizer=address
. You'll have to rerun the suite several times if the corruption originates from a race condition.-fsanitize=memory
-fsanitize=thread
.You can also try valgrind (has several modes, including helgrtind - a thread error detector), but in my experience it's difficult to find delicate race conditions with it and running with valgrind usually takes forever. The upshot is that you don't have to recompile kratos for this, just run the test suite through valgrind/helgrind.
If you're out of ideas, you can also try repeatedly running the test suite with
gdb
until something breaks. If something does break,gdb
will pause the program at that point and you'll be able to query the callstack withbt
. If the error really is a memory corruption due to a race condition, you might not run into the error where the bug is, but it might give you a hint on where the bug might be coming from.The text was updated successfully, but these errors were encountered: