You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have decided to not enforce SSA-form in our IR, however, this can cause worse encodings.
Whenever encoding a read access of register r, we need to iterate over all possible (last) assignments to r which can be O(n) many. With O(n) many registers, this can yield an encoding of size O(n^2).
This is expected, because the size of a Use-Def Graph can be quadratic for non-SSA programs.
However, it is known to be only linear for SSA programs.
Consider the following program:
int i
for (i = 0; i < N; i++) {
if (__VERIFIER_nondet_bool()) {
break;
}
}
int square = 0;
for (int j; j < N; j++) {
square += i;
}
In the second loop, each read of i may read N-many different values (one from each iteration of the first loop). The encoding is quadratic.
Introducing a temporary variable int temp = i; // ~PHI node after the first loop and reusing that one in the second loop will produce a linear encoding.
Since our parsed programs are originally in SSA form, one might think that only loop unrolling produces non-SSA parts of the program (which are rather small, given the usual loop bounds). However, we also do copy propagation which actually removes SSA form by eliminating phi nodes (it would remove the int temp = i; assignment from the above example).
So I suggest that we try to recover an SSA-like form after all processing (so that SCCP cannot destroy it again) by inserting simple register copy events (r_phi := r;) to act as phi nodes.
Edit: I quickly checked how many opportunities we have for this optimization, and the numbers are not really high.
Even in qspinlock there are just around 9 cases per thread, most of which have to do with the non-deterministic CAS I believe. Linux RCU had even less cases. Our usual LFDS benchmarks have 1-2 cases per thread (safe_stack is the biggest outlier with 8 cases per thread!).
The largest benchmark was ck_epoch with 33 cases per thread.
I believe this is the case because ck_epoch has a function that tries to find an element in a data structure (via a loop) and returns it. The value of the return register could come from any iteration, thus every read access to that register has many possible source events.
In conclusion, it seems that the room for optimizations is low and one would only expect benefit for benchmarks that search in data structures.
In case we try to implement this, safe_stack seems to be the best choice among our unit tests to measure the impact.
The text was updated successfully, but these errors were encountered:
We have decided to not enforce SSA-form in our IR, however, this can cause worse encodings.
Whenever encoding a read access of register
r
, we need to iterate over all possible (last) assignments tor
which can be O(n) many. With O(n) many registers, this can yield an encoding of size O(n^2).This is expected, because the size of a Use-Def Graph can be quadratic for non-SSA programs.
However, it is known to be only linear for SSA programs.
Consider the following program:
In the second loop, each read of
i
may readN
-many different values (one from each iteration of the first loop). The encoding is quadratic.Introducing a temporary variable
int temp = i; // ~PHI node
after the first loop and reusing that one in the second loop will produce a linear encoding.Since our parsed programs are originally in SSA form, one might think that only loop unrolling produces non-SSA parts of the program (which are rather small, given the usual loop bounds). However, we also do copy propagation which actually removes SSA form by eliminating phi nodes (it would remove the
int temp = i;
assignment from the above example).So I suggest that we try to recover an SSA-like form after all processing (so that SCCP cannot destroy it again) by inserting simple register copy events (
r_phi := r;
) to act as phi nodes.Edit: I quickly checked how many opportunities we have for this optimization, and the numbers are not really high.
Even in
qspinlock
there are just around 9 cases per thread, most of which have to do with the non-deterministic CAS I believe.Linux RCU
had even less cases. Our usual LFDS benchmarks have 1-2 cases per thread (safe_stack
is the biggest outlier with 8 cases per thread!).The largest benchmark was
ck_epoch
with 33 cases per thread.I believe this is the case because
ck_epoch
has a function that tries to find an element in a data structure (via a loop) and returns it. The value of the return register could come from any iteration, thus every read access to that register has many possible source events.In conclusion, it seems that the room for optimizations is low and one would only expect benefit for benchmarks that search in data structures.
In case we try to implement this,
safe_stack
seems to be the best choice among our unit tests to measure the impact.The text was updated successfully, but these errors were encountered: