Performance of alias analysis #369

hernanponcedeleon · 2023-01-22T15:48:05Z

Our alias analysis is very slow when run on this program.

These are the times I get (running this requires GlobalSettings.ARCH_PRECISION = 64)

[22.01.2023] 16:43:34 [INFO] AliasAnalysis.fromConfig - Selected Alias Analysis: FIELD_SENSITIVE
[22.01.2023] 16:45:37 [INFO] AliasAnalysis.fromConfig - Done

The text was updated successfully, but these errors were encountered:

ThomasHaas · 2023-01-23T15:11:25Z

The C-code of the benchmark contains bit-fiddling on pointers, which our alias analysis cannot handle.
Here is an example:

static inline struct task_struct *rt_mutex_owner(struct rt_mutex_base *lock)
{
	unsigned long owner = (unsigned long) READ_ONCE(lock->owner);

	return (struct task_struct *) (owner & ~RT_MUTEX_HAS_WAITERS); // Bit-fiddling on pointer
}

Our alias analysis will assume that loads/stores involving the returned pointer can address anything, due to the bit-fiddling on pointers.

hernanponcedeleon · 2023-01-23T15:15:04Z

I understand this will have an effect on the precision of the analysis, but is this also related to why running the analysis is so slow?

ThomasHaas · 2023-01-23T15:20:15Z

Let me correct what I said. While the bit-fiddling is problematic, the fact that we load a pointer from memory should already give us a pointer without any aliasing information, so the bit-fiddling doesn't make it worse, I guess.

I don't know how exactly the performance correlates to the precision, since it might also get faster due not being able to derive anything. However, this code is interesting for the alias analysis in general.

ThomasHaas · 2023-01-23T19:47:20Z

This benchmark has around 1200 memory addresses and I ran out of Java heap space while profiling, but I got some data from that.
All time of the algorithm (up to the crash after around 3 minutes) is spent in the methods fields (71% of time) and addAllAddresses (29%).
Both functions are easily improvable, and from simple rewriting I got it like 6 times faster already.

ThomasHaas · 2023-01-23T20:06:00Z

The last issue is just that we store a fresh HashSet per pointer in the program.
Many pointers have the same potential set of addresses they can access, so they could share the same HashSet-instance for that. I think there are many ways to go about this, like e.g. canonicalizing every points-to-set or finding equivalence classes of pointers in a pre-analysis so that they can be "merged".

Edit: PR #374 addresses the performance loss in fields. The remaining bottleneck is addAllAddresses which is affected by the above mentioned problem.

hernanponcedeleon · 2023-01-31T08:00:02Z

@ThomasHaas you mentioned that you discussed (before the improvement that you already implemented) with Rene some potential improvements. Do those differ of what was already implemented? I.e. can we improve performance even further or should we close this issue?

ThomasHaas · 2023-01-31T09:44:07Z

What I discussed with René is orthogonal to what I improved so far. Overall there are more optimizable points, but I don't mind closing this issue if you think it is good enough for now.

hernanponcedeleon · 2023-01-31T09:55:22Z

Right now, it is less blocking than it used to be, but I would not complain if we can get the times faster. Since we already have ideas on how to improve this, let's keep this issue open and we get those done.

hernanponcedeleon · 2023-11-30T21:04:53Z

@ThomasHaas can you add to this issue the llvm file of tree-RCU so we have a concrete test showing the problem?

ThomasHaas · 2023-12-01T11:53:47Z

Here is a reduced version of the example: rcu-test-alias.ll.zip.
The original takes over an hour and I hope that this one is a bit faster (but still slow).
I'm also wondering if the analysis problem is related to poison addresses used in linked lists.
Those are clearly problematic, but I'm not sure if the relevant code is ever executed.

Edit: The alias analysis on the reduced version seems to take around 30 minutes on my machine (bound=1).

hernanponcedeleon · 2024-09-08T06:02:25Z

The alias analysis did not finish overnight for benchmarks/folds/was.c.

@xeren do you have any idea why the performance is so bad in this example?

ThomasHaas · 2024-09-08T09:33:06Z

On what branch do you have that code?

hernanponcedeleon · 2024-09-08T10:19:33Z

Sorry (spell checker). This code
https://github.com/hernanponcedeleon/Dat3M/blob/master/benchmarks/lfds/wsq.c

hernanponcedeleon · 2024-09-08T15:17:34Z

BTW ... both the field sensitive/insensitive analysis finish below 1 sec and seems to produce good results, e.g.

[08.09.2024] 17:14:41 [INFO] AliasAnalysis.fromConfig - Selected alias analysis: FIELD_SENSITIVE
[08.09.2024] 17:14:41 [INFO] AliasAnalysis.fromConfig - Finished alias analysis in 0.53 secs
[08.09.2024] 17:14:41 [INFO] RelationAnalysis.fromConfig - Selected relation analysis: NATIVE
[08.09.2024] 17:14:41 [INFO] RelationAnalysis.fromConfig - 
        wmm.analysis.relationAnalysis: NATIVE
        wmm.analysis.extendedRelationAnalysis: true
[08.09.2024] 17:14:47 [INFO] RelationAnalysis.fromConfig - Finished regular analysis in 6.150 secs
[08.09.2024] 17:14:49 [INFO] RelationAnalysis.fromConfig - Finished extended analysis in 1.205 secs
[08.09.2024] 17:14:49 [INFO] RelationAnalysis.fromConfig - 
======== RelationAnalysis summary ======== 
        #Relations: 109
        #Axioms: 9
        #may-edges removed (extended): 67545
        #must-edges added (extended): 0
        total #must|may|exclusive edges: 1647667|2416852|0
        #must|may rf edges: 2|3710
        #must|may co edges: 2396|4728
===========================================

[08.09.2024] 17:16:37 [INFO] AliasAnalysis.fromConfig - Selected alias analysis: FIELD_INSENSITIVE
[08.09.2024] 17:16:37 [INFO] AliasAnalysis.fromConfig - Finished alias analysis in 0.24 secs
[08.09.2024] 17:16:37 [INFO] RelationAnalysis.fromConfig - Selected relation analysis: NATIVE
[08.09.2024] 17:16:37 [INFO] RelationAnalysis.fromConfig - 
        wmm.analysis.relationAnalysis: NATIVE
        wmm.analysis.extendedRelationAnalysis: true
[08.09.2024] 17:16:43 [INFO] RelationAnalysis.fromConfig - Finished regular analysis in 5.965 secs
[08.09.2024] 17:16:44 [INFO] RelationAnalysis.fromConfig - Finished extended analysis in 1.622 secs
[08.09.2024] 17:16:44 [INFO] RelationAnalysis.fromConfig - 
======== RelationAnalysis summary ======== 
        #Relations: 109
        #Axioms: 9
        #may-edges removed (extended): 87917
        #must-edges added (extended): 0
        total #must|may|exclusive edges: 1647667|2522918|0
        #must|may rf edges: 2|4952
        #must|may co edges: 2396|6084
===========================================

ThomasHaas · 2024-09-09T09:51:46Z

Due to the recent update to mem2reg, we get a lot of warnings regarding pontentially uninitialized variables.
My best bet is that the alias analysis has to be conservative on those (assuming non-det initial value) which causes it to generate lots of aliasing. Notice that newtasks is an array of pointers, so a non-det value allows for aliasing with every object.
~~We need to incorporate reasoning about path conditions to prove that the values are initialized.~~ (Won't work in this example).

EDIT: Actually, in this example proving that we do not access uninitialized registers seems to be very hard. In fact, I don't understand how this code is correct.

    if (count >= q.mask) {
        // == (count >= size-1)
        int newsize = (q.mask == 0 ? q.InitialSize : 2 * (q.mask + 1));

        assert(newsize < q.MaxSize);

        Obj *newtasks[STATICSIZE]; // This gets mem2reg'd
        int i;
        for (i = 0; i < count; i++) {
            int temp = (h + i) & q.mask;
            newtasks[i] = q.elems[temp];
        }
        for (i = 0; i < newsize; i++) {
            q.elems[i] = newtasks[i]; // This is only initialized by the above loop if "newsize <= count".
        }

Consider the entry condition count >= q.mask with a count that is only slightly larger than the mask, and q.mask != 0. Then int newsize = (q.mask == 0 ? q.InitialSize : 2 * (q.mask + 1)); easily leads to a value that is larger than count.
This in turn really causes uninitialized values to get accessed. Isn't this UB?.

hernanponcedeleon · 2024-09-09T11:38:19Z

I agree it is not trivial to know if the code has UB (even with the patch below). However, the patch below fixes all the warnings and the alias analysis still seems to have problems

diff --git a/benchmarks/lfds/wsq.c b/benchmarks/lfds/wsq.c
index 2831dac3a..142e05ad0 100644
--- a/benchmarks/lfds/wsq.c
+++ b/benchmarks/lfds/wsq.c
@@ -14,7 +14,7 @@
 #define STEAL_ATTEMPS 1
 
 void *stealer(void *unused) {
-    Obj *r;
+    Obj *r = NULL;
     for (int i = 0; i < STEAL_ATTEMPS; i++) {
         if (steal(&r)) {
             operation(r);
@@ -41,14 +41,14 @@ int main(void) {
     for (int i = 0; i < ITEMS / 2; i++) {
         push(&items[2 * i]);
         push(&items[2 * i + 1]);
-        Obj *r;
+        Obj *r = NULL;
         if (pop(&r)) {
             operation(r);
         }
     }
 
     for (int i = 0; i < ITEMS / 2; i++) {
-        Obj *r;
+        Obj *r = NULL;
         if (pop(&r)) {
             operation(r);
         }
diff --git a/benchmarks/lfds/wsq.h b/benchmarks/lfds/wsq.h
index c992bf794..0e620b419 100644
--- a/benchmarks/lfds/wsq.h
+++ b/benchmarks/lfds/wsq.h
@@ -209,6 +209,9 @@ void syncPush(Obj* elem) {
             int temp = (h + i) & q.mask;
             newtasks[i] = q.elems[temp];
         }
+        for (i = count; i < STATICSIZE - count; i++) {
+            newtasks[i] = NULL;
+        }
         for (i = 0; i < newsize; i++) {
             q.elems[i] = newtasks[i];
         }

ThomasHaas · 2024-09-09T13:00:27Z

I haven't checked Dartagnan on your patch, but I would be surprised if the patch really changed much.
Since the initializing loop has dynamic bounds (cause count is dynamic), I don't think that Dartagnan is able to figure out that newtasks is really initialized (this requires far more intricate reasoning).

xeren · 2024-09-09T14:57:58Z

The alias analysis did not finish overnight for benchmarks/folds/was.c.

@xeren do you have any idea why the performance is so bad in this example?

This program seems to contain negative weight cycles. This seems to put IBPA into an endless loop. The best fix appears to be setting TOP = List.of(-1) (what it actually should be in theory). Then it takes 0.5 secs on my machine. I did not define it like that in the first place, because it lost precision in other programs, compared to the older analyses.

hernanponcedeleon · 2024-09-09T16:16:15Z

I did not define it like that in the first place, because it lost precision in other programs, compared to the older analyses.

Does this means that "in theory" it is not the case that the default algorithm is always better than the others?

ThomasHaas · 2024-09-09T16:50:55Z

IIRC, the old algorithms were just unsound in those cases and that is why they were "more precise".

xeren · 2024-09-09T18:12:04Z

Exactly. In terms of precision, "in theory" cannot really apply here. In practice, the fixed default is sometimes less precise.

Version 1 (FIELD_INSENSITIVE) truncates out-of-bounds addresses every time, in order to avoid looping on non-zero-weight cycles.

Version 2 (FIELD_SENSITIVE) basically kept this issue, but also introduced a provenance assumption. Then there was a loss in precision in some litmus tests featuring pointer sets starting with the last field of an object (e.g. &A+12+4x on a 16-bytes object A). There was only one possible location, so this generated more MUST-alias relationships. To enforce more precision everywhere, I made the assumption, that x is non-negative.

In Version 3 (FULL), I replaced the out-of-bounds filter with a different handling of positive-weight cycles. But I kept the assumption from before, which lead to this issue.

hernanponcedeleon added the performance label Jan 22, 2023

hernanponcedeleon mentioned this issue Jan 23, 2023

Verification of Linux Kernel Loop (rtmutex) (2) #373

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of alias analysis #369

Performance of alias analysis #369

hernanponcedeleon commented Jan 22, 2023

ThomasHaas commented Jan 23, 2023

hernanponcedeleon commented Jan 23, 2023

ThomasHaas commented Jan 23, 2023

ThomasHaas commented Jan 23, 2023

ThomasHaas commented Jan 23, 2023 •

edited

Loading

hernanponcedeleon commented Jan 31, 2023

ThomasHaas commented Jan 31, 2023

hernanponcedeleon commented Jan 31, 2023

hernanponcedeleon commented Nov 30, 2023

ThomasHaas commented Dec 1, 2023 •

edited

Loading

hernanponcedeleon commented Sep 8, 2024

ThomasHaas commented Sep 8, 2024

hernanponcedeleon commented Sep 8, 2024

hernanponcedeleon commented Sep 8, 2024

ThomasHaas commented Sep 9, 2024 •

edited

Loading

hernanponcedeleon commented Sep 9, 2024

ThomasHaas commented Sep 9, 2024

xeren commented Sep 9, 2024

hernanponcedeleon commented Sep 9, 2024

ThomasHaas commented Sep 9, 2024

xeren commented Sep 9, 2024

Performance of alias analysis #369

Performance of alias analysis #369

Comments

hernanponcedeleon commented Jan 22, 2023

ThomasHaas commented Jan 23, 2023

hernanponcedeleon commented Jan 23, 2023

ThomasHaas commented Jan 23, 2023

ThomasHaas commented Jan 23, 2023

ThomasHaas commented Jan 23, 2023 • edited Loading

hernanponcedeleon commented Jan 31, 2023

ThomasHaas commented Jan 31, 2023

hernanponcedeleon commented Jan 31, 2023

hernanponcedeleon commented Nov 30, 2023

ThomasHaas commented Dec 1, 2023 • edited Loading

hernanponcedeleon commented Sep 8, 2024

ThomasHaas commented Sep 8, 2024

hernanponcedeleon commented Sep 8, 2024

hernanponcedeleon commented Sep 8, 2024

ThomasHaas commented Sep 9, 2024 • edited Loading

hernanponcedeleon commented Sep 9, 2024

ThomasHaas commented Sep 9, 2024

xeren commented Sep 9, 2024

hernanponcedeleon commented Sep 9, 2024

ThomasHaas commented Sep 9, 2024

xeren commented Sep 9, 2024

ThomasHaas commented Jan 23, 2023 •

edited

Loading

ThomasHaas commented Dec 1, 2023 •

edited

Loading

ThomasHaas commented Sep 9, 2024 •

edited

Loading