Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of alias analysis #369

Open
hernanponcedeleon opened this issue Jan 22, 2023 · 21 comments
Open

Performance of alias analysis #369

hernanponcedeleon opened this issue Jan 22, 2023 · 21 comments

Comments

@hernanponcedeleon
Copy link
Owner

Our alias analysis is very slow when run on this program.

These are the times I get (running this requires GlobalSettings.ARCH_PRECISION = 64)

[22.01.2023] 16:43:34 [INFO] AliasAnalysis.fromConfig - Selected Alias Analysis: FIELD_SENSITIVE
[22.01.2023] 16:45:37 [INFO] AliasAnalysis.fromConfig - Done
@ThomasHaas
Copy link
Collaborator

The C-code of the benchmark contains bit-fiddling on pointers, which our alias analysis cannot handle.
Here is an example:

static inline struct task_struct *rt_mutex_owner(struct rt_mutex_base *lock)
{
	unsigned long owner = (unsigned long) READ_ONCE(lock->owner);

	return (struct task_struct *) (owner & ~RT_MUTEX_HAS_WAITERS); // Bit-fiddling on pointer
}

Our alias analysis will assume that loads/stores involving the returned pointer can address anything, due to the bit-fiddling on pointers.

@hernanponcedeleon
Copy link
Owner Author

I understand this will have an effect on the precision of the analysis, but is this also related to why running the analysis is so slow?

@ThomasHaas
Copy link
Collaborator

Let me correct what I said. While the bit-fiddling is problematic, the fact that we load a pointer from memory should already give us a pointer without any aliasing information, so the bit-fiddling doesn't make it worse, I guess.

I don't know how exactly the performance correlates to the precision, since it might also get faster due not being able to derive anything. However, this code is interesting for the alias analysis in general.

@ThomasHaas
Copy link
Collaborator

This benchmark has around 1200 memory addresses and I ran out of Java heap space while profiling, but I got some data from that.
All time of the algorithm (up to the crash after around 3 minutes) is spent in the methods fields (71% of time) and addAllAddresses (29%).
Both functions are easily improvable, and from simple rewriting I got it like 6 times faster already.

grafik

@ThomasHaas
Copy link
Collaborator

ThomasHaas commented Jan 23, 2023

The last issue is just that we store a fresh HashSet per pointer in the program.
Many pointers have the same potential set of addresses they can access, so they could share the same HashSet-instance for that. I think there are many ways to go about this, like e.g. canonicalizing every points-to-set or finding equivalence classes of pointers in a pre-analysis so that they can be "merged".

Edit: PR #374 addresses the performance loss in fields. The remaining bottleneck is addAllAddresses which is affected by the above mentioned problem.

@hernanponcedeleon
Copy link
Owner Author

@ThomasHaas you mentioned that you discussed (before the improvement that you already implemented) with Rene some potential improvements. Do those differ of what was already implemented? I.e. can we improve performance even further or should we close this issue?

@ThomasHaas
Copy link
Collaborator

What I discussed with René is orthogonal to what I improved so far. Overall there are more optimizable points, but I don't mind closing this issue if you think it is good enough for now.

@hernanponcedeleon
Copy link
Owner Author

Right now, it is less blocking than it used to be, but I would not complain if we can get the times faster. Since we already have ideas on how to improve this, let's keep this issue open and we get those done.

@hernanponcedeleon
Copy link
Owner Author

@ThomasHaas can you add to this issue the llvm file of tree-RCU so we have a concrete test showing the problem?

@ThomasHaas
Copy link
Collaborator

ThomasHaas commented Dec 1, 2023

Here is a reduced version of the example: rcu-test-alias.ll.zip.
The original takes over an hour and I hope that this one is a bit faster (but still slow).
I'm also wondering if the analysis problem is related to poison addresses used in linked lists.
Those are clearly problematic, but I'm not sure if the relevant code is ever executed.

Edit: The alias analysis on the reduced version seems to take around 30 minutes on my machine (bound=1).

@hernanponcedeleon
Copy link
Owner Author

The alias analysis did not finish overnight for benchmarks/folds/was.c.

@xeren do you have any idea why the performance is so bad in this example?

@ThomasHaas
Copy link
Collaborator

On what branch do you have that code?

@hernanponcedeleon
Copy link
Owner Author

@hernanponcedeleon
Copy link
Owner Author

BTW ... both the field sensitive/insensitive analysis finish below 1 sec and seems to produce good results, e.g.

[08.09.2024] 17:14:41 [INFO] AliasAnalysis.fromConfig - Selected alias analysis: FIELD_SENSITIVE
[08.09.2024] 17:14:41 [INFO] AliasAnalysis.fromConfig - Finished alias analysis in 0.53 secs
[08.09.2024] 17:14:41 [INFO] RelationAnalysis.fromConfig - Selected relation analysis: NATIVE
[08.09.2024] 17:14:41 [INFO] RelationAnalysis.fromConfig - 
        wmm.analysis.relationAnalysis: NATIVE
        wmm.analysis.extendedRelationAnalysis: true
[08.09.2024] 17:14:47 [INFO] RelationAnalysis.fromConfig - Finished regular analysis in 6.150 secs
[08.09.2024] 17:14:49 [INFO] RelationAnalysis.fromConfig - Finished extended analysis in 1.205 secs
[08.09.2024] 17:14:49 [INFO] RelationAnalysis.fromConfig - 
======== RelationAnalysis summary ======== 
        #Relations: 109
        #Axioms: 9
        #may-edges removed (extended): 67545
        #must-edges added (extended): 0
        total #must|may|exclusive edges: 1647667|2416852|0
        #must|may rf edges: 2|3710
        #must|may co edges: 2396|4728
===========================================
[08.09.2024] 17:16:37 [INFO] AliasAnalysis.fromConfig - Selected alias analysis: FIELD_INSENSITIVE
[08.09.2024] 17:16:37 [INFO] AliasAnalysis.fromConfig - Finished alias analysis in 0.24 secs
[08.09.2024] 17:16:37 [INFO] RelationAnalysis.fromConfig - Selected relation analysis: NATIVE
[08.09.2024] 17:16:37 [INFO] RelationAnalysis.fromConfig - 
        wmm.analysis.relationAnalysis: NATIVE
        wmm.analysis.extendedRelationAnalysis: true
[08.09.2024] 17:16:43 [INFO] RelationAnalysis.fromConfig - Finished regular analysis in 5.965 secs
[08.09.2024] 17:16:44 [INFO] RelationAnalysis.fromConfig - Finished extended analysis in 1.622 secs
[08.09.2024] 17:16:44 [INFO] RelationAnalysis.fromConfig - 
======== RelationAnalysis summary ======== 
        #Relations: 109
        #Axioms: 9
        #may-edges removed (extended): 87917
        #must-edges added (extended): 0
        total #must|may|exclusive edges: 1647667|2522918|0
        #must|may rf edges: 2|4952
        #must|may co edges: 2396|6084
===========================================

@ThomasHaas
Copy link
Collaborator

ThomasHaas commented Sep 9, 2024

Due to the recent update to mem2reg, we get a lot of warnings regarding pontentially uninitialized variables.
My best bet is that the alias analysis has to be conservative on those (assuming non-det initial value) which causes it to generate lots of aliasing. Notice that newtasks is an array of pointers, so a non-det value allows for aliasing with every object.
We need to incorporate reasoning about path conditions to prove that the values are initialized. (Won't work in this example).

EDIT: Actually, in this example proving that we do not access uninitialized registers seems to be very hard. In fact, I don't understand how this code is correct.

    if (count >= q.mask) {
        // == (count >= size-1)
        int newsize = (q.mask == 0 ? q.InitialSize : 2 * (q.mask + 1));

        assert(newsize < q.MaxSize);

        Obj *newtasks[STATICSIZE]; // This gets mem2reg'd
        int i;
        for (i = 0; i < count; i++) {
            int temp = (h + i) & q.mask;
            newtasks[i] = q.elems[temp];
        }
        for (i = 0; i < newsize; i++) {
            q.elems[i] = newtasks[i]; // This is only initialized by the above loop if "newsize <= count".
        }

Consider the entry condition count >= q.mask with a count that is only slightly larger than the mask, and q.mask != 0. Then int newsize = (q.mask == 0 ? q.InitialSize : 2 * (q.mask + 1)); easily leads to a value that is larger than count.
This in turn really causes uninitialized values to get accessed. Isn't this UB?.

@hernanponcedeleon
Copy link
Owner Author

I agree it is not trivial to know if the code has UB (even with the patch below). However, the patch below fixes all the warnings and the alias analysis still seems to have problems

diff --git a/benchmarks/lfds/wsq.c b/benchmarks/lfds/wsq.c
index 2831dac3a..142e05ad0 100644
--- a/benchmarks/lfds/wsq.c
+++ b/benchmarks/lfds/wsq.c
@@ -14,7 +14,7 @@
 #define STEAL_ATTEMPS 1
 
 void *stealer(void *unused) {
-    Obj *r;
+    Obj *r = NULL;
     for (int i = 0; i < STEAL_ATTEMPS; i++) {
         if (steal(&r)) {
             operation(r);
@@ -41,14 +41,14 @@ int main(void) {
     for (int i = 0; i < ITEMS / 2; i++) {
         push(&items[2 * i]);
         push(&items[2 * i + 1]);
-        Obj *r;
+        Obj *r = NULL;
         if (pop(&r)) {
             operation(r);
         }
     }
 
     for (int i = 0; i < ITEMS / 2; i++) {
-        Obj *r;
+        Obj *r = NULL;
         if (pop(&r)) {
             operation(r);
         }
diff --git a/benchmarks/lfds/wsq.h b/benchmarks/lfds/wsq.h
index c992bf794..0e620b419 100644
--- a/benchmarks/lfds/wsq.h
+++ b/benchmarks/lfds/wsq.h
@@ -209,6 +209,9 @@ void syncPush(Obj* elem) {
             int temp = (h + i) & q.mask;
             newtasks[i] = q.elems[temp];
         }
+        for (i = count; i < STATICSIZE - count; i++) {
+            newtasks[i] = NULL;
+        }
         for (i = 0; i < newsize; i++) {
             q.elems[i] = newtasks[i];
         }

@ThomasHaas
Copy link
Collaborator

I haven't checked Dartagnan on your patch, but I would be surprised if the patch really changed much.
Since the initializing loop has dynamic bounds (cause count is dynamic), I don't think that Dartagnan is able to figure out that newtasks is really initialized (this requires far more intricate reasoning).

@xeren
Copy link
Collaborator

xeren commented Sep 9, 2024

The alias analysis did not finish overnight for benchmarks/folds/was.c.

@xeren do you have any idea why the performance is so bad in this example?

This program seems to contain negative weight cycles. This seems to put IBPA into an endless loop. The best fix appears to be setting TOP = List.of(-1) (what it actually should be in theory). Then it takes 0.5 secs on my machine. I did not define it like that in the first place, because it lost precision in other programs, compared to the older analyses.

@hernanponcedeleon
Copy link
Owner Author

I did not define it like that in the first place, because it lost precision in other programs, compared to the older analyses.

Does this means that "in theory" it is not the case that the default algorithm is always better than the others?

@ThomasHaas
Copy link
Collaborator

IIRC, the old algorithms were just unsound in those cases and that is why they were "more precise".

@xeren
Copy link
Collaborator

xeren commented Sep 9, 2024

Exactly. In terms of precision, "in theory" cannot really apply here. In practice, the fixed default is sometimes less precise.

Version 1 (FIELD_INSENSITIVE) truncates out-of-bounds addresses every time, in order to avoid looping on non-zero-weight cycles.

Version 2 (FIELD_SENSITIVE) basically kept this issue, but also introduced a provenance assumption. Then there was a loss in precision in some litmus tests featuring pointer sets starting with the last field of an object (e.g. &A+12+4x on a 16-bytes object A). There was only one possible location, so this generated more MUST-alias relationships. To enforce more precision everywhere, I made the assumption, that x is non-negative.

In Version 3 (FULL), I replaced the out-of-bounds filter with a different handling of positive-weight cycles. But I kept the assumption from before, which lead to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
@xeren @ThomasHaas @hernanponcedeleon and others