-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Race Handling #674
Comments
Hi @coolmax3002 The benchmarks in the folder consist of two parts: (1) the implementation of the algorithm API, and (2) a client using the API. Part (1) is implemented in the Part (2) is implemented in the There are 3 properties you can check with dat3m: safety (i.e. assertions), liveness, and things that can be specified in the cat file (data races fit here, see e.g. here). By default dat3m only checks for assertions. You can enable all 3 properties by using Data races only make sense at the programming language level, that is why hardware memory models do not have a definition of data races in their cat file. If you want to check for data races, you should use either |
We could also extend IMM with the race definition from RC11. I think IMM is the most canonical model when checking C-code and so it would make sense if that one also covered data races. |
I disagree here. IMM is supposed to be a wrapper for hardware models, and as such, the notion of data races does not make sense for this model. I think this is exactly why genmc (the other tool that uses IMM) does not check for races in IMM. |
Ok, philosophically I buy this argument, but I'm wondering what would go wrong in practice if you added races to IMM. From a practicioners point of view, what models would you use to make sure your code is correct? You cannot use |
This is what the VSync project has been doing for quite some time due to lack of a better alternative. However, now that As you mentioned, this has the drawback that |
Ah ok, and it's because Dartagnan operates directly on the source files that it doesn't have to worry about UB caused by data races in the C code when running against the language models? And for the hardware models it assumes the C code is being run "as is" on the hardware? It does seem a little strange to be checking for safety and liveness against a program that may be undefined completely according to the model, but using RC11 for data races and a hardware model for the other properties does seem like the most sound option. That's also why we were thinking of writing some versions of the benchmarks that use only atomics in the critical sections, so we can still check assertions in the language models without having to worry about data races. |
This is only partially true. Dartagnan operates at the LLVM-IR level so if the original C program has data races (or any other UB) the IR dartagnan gets might already be semantically different than the C code. We do use some compiler optimisations (those set in the
In principle you are right that if the program has UB, we cannot make any claim about the other properties.
If I understood @camillegandotra correctly, you are also exploring relaxed versions of the locks which do not guarantee mutual exclusion (and can thus be racy). If this is the case, then yes, probably you want to use atomics in the critical section to avoid UB. |
We compile the C-code to LLVM and apply minor optimizations but none of which rely on data-race freedom.
do not get trivialized to
however, the compiler could theoretically split the 2-byte read into two 1-byte reads that each observe a different store and thus possibly see a combination of both writes. |
IIRC, the definition of sta races in VMM should take that into account up to some point, for example, racy accesses protected with seqlock are not data races. |
Some RC11 graphs would never occur on IMM and vice versa, which would already be enough to get different data races. The real IMM does not have a notion of non-atomic and therefore can't define races, and if you do add non-atomics, you have to think about what that means for ar. In GenMC, non-atomics have historically been "broken" w.r.t. ar, for example a fence wouldn't order them at all. Let's say you add them to ar though, then the hb definition of data race is also bogus for IMM, since the ar would prevent certain races (in the sense of preventing one of the two orders, i.e., there's no graph in which the events happen in another order) but you'd still be flagged for a data race. And if you do add ar to the data race definition somehow, then you're likely to be unsound at the language level because of the compiler optimizations that IMM disallows. Either way the results are unlikely to be perfect, even in practice. |
How sure are you about that? Do you have a full model of what the optimizations that you have enabled can do? |
Yes, in VMM having well-formed races on non-atomics is completely fine. |
Ok, fair enough. We currently run a CSE pass (*) that can eliminate "redundant" loads and actually will in the lock examples unless the plain accesses are volatile. All the other passes are related to simplifying control-flow structure and should neither eliminate memory operations(**) nor move them around (if the LLVM documentation is to be trusted). (*) Technically we run nothing automatically. The passes are specified in an environment variable and we "just recommend" to enable CSE. @hernanponcedeleon have you tested how verification times are affected when disabling CSE? (**) Well, we run |
I never did a fine-grained analysis of how each of the passes we recommend affect performance. However, based on some testing I did last week, it is not even clear if running these optimizations is helpful compared with simply annotating statics loops to be fully unrolled. |
@coolmax3002 @reeselevine are there any open questions about this or can I close? |
@coolmax3002 confirmed by email this could be closed. |
When examining the lock implementations in the benchmarks directory, I noticed that threads often interact with non-atomic integers. In multi-threaded programs, using non-atomic variables can lead to data races. Given that Dat3M tests the validity of executions under different memory models and architectures, how does Dat3M handle data races?
cc: @camillegandotra, @reeselevine, @tyler-utah
The text was updated successfully, but these errors were encountered: