Compute missing tuples in transitive closure #718

DIvanov503 · 2024-08-16T12:31:37Z

No description provided.

ThomasHaas · 2024-08-16T15:16:10Z

I would still like to know what goes wrong in the old implementation. I don't think you ever need to do both pre and post composition to get the correct results. All of the following fixed point equations should be correct (i.e. have the same LFP):

r+ = r | r;r+
r+ = r | r+;r
r+ = r | r+;r | r;r+
r+ = r | r+;r+

I assume that this remains correct, even when avoiding composition of mutually exclusive pairs.
That being said, have you tried the original implementation with a slight change: (i) ignore mutual exclusion for composition, (ii) remove mutual exclusive pairs only from the final result?
If this gives the same result to the Datalog implementation then there is a problem with the intermediate usage of mutual exclusion.

DIvanov503 · 2024-08-16T17:08:56Z

If mutual exclusion is used in composition, then the last equation has the largest fixed point among them. Example:

r = {(1, 2), (2, 3), (3, 4), (4, 5)}
mutex = {(2, 5), (1, 4)}

The tuple (1, 5) will be present only in the last case through (1, 3) ; (3, 5) but not in the former ones because both (1, 2) and (4, 5) have their right/left r+ counterparts prohibited by mutex. Without mutex the four equations are equivalent. With the approach you proposed we should get more tuples in some cases. Imagine if in this example mutex was {(2, 5), (1, 4), (1, 3), (3, 5)}. Then the closure without mutex after being filtered would still give us (1, 5) but there would be no pair of tuples in the resulting relation which would compose into (1, 5).

The old implementation produces an incorrect result in the following example:

(a, b) in oldOuter
(b, c) in inner

The old implementation initializes current with inner and outer with oldOuter | inner. During the iteration current will contain only "new" tuples. Hence, the only chance for (a, c) to be inferred, is to have outer ; current to be computed in the loop but it does only current ; outer instead.

DIvanov503 · 2024-08-16T17:10:56Z

I suspect that there's either something small overlooked or my assumption about oldOuter being a correct transitive closure of some subset of inner is wrong.

ThomasHaas · 2024-08-17T01:40:36Z

If mutual exclusion is used in composition, then the last equation has the largest fixed point among them. Example:
r = {(1, 2), (2, 3), (3, 4), (4, 5)}
mutex = {(2, 5), (1, 4)}
The tuple (1, 5) will be present only in the last case through (1, 3) ; (3, 5) but not in the former ones because both (1, 2) and (4, 5) have their right/left r+ counterparts prohibited by mutex.

Interesting example.

With the approach you proposed we should get more tuples in some cases. Imagine if in this example mutex was {(2, 5), (1, 4), (1, 3), (3, 5)}. Then the closure without mutex after being filtered would still give us (1, 5) but there wouldbe no pair of tuples in the resulting relation which would compose into (1, 5).

Yes, that's why I suggested it. IIRC, the Datalog implementation gave larger results and I wanted to match those by making this computation larger as well. But I take from your reply that this gives results larger than what Datalog produces.

The old implementation produces an incorrect result in the following example:
(a, b) in oldOuter
(b, c) in inner
The old implementation initializes current with inner and outer with oldOuter | inner. During the iteration current will contain only "new" tuples. Hence, the only chance for (a, c) to be inferred, is to have outer ; current to be computed in the loop but it does only current ; outer instead.

This should not happen since you would start with a wrong approximation of the old transitive closure here.

tc(r) = r | r;tc(r)   -->  tc(r + d) = (r + d) | (r + d);tc(r + d)

Now I believe that inner = (r + d) and oldOuter = tc(r) + d (which is a valid underapproximation for the monotonic iteration scheme). Therefore, we have the following FP equation which we can iterate starting from oldOuter:

tc(inner) = inner | inner;tc(inner)

In particular, the iterations should thus be of the shape inner;updatedEdges or updatedEdges;inner if using a different FP formulation. From what I can see, the current algorithm has current = updatedEdges and uses the second update mechanism. Also, instead of using just inner for the composition, it uses the larger outer (which is not wrong, but unnecessary/less precise I think cause it roughly amounts to the quadratic tc equation).
~~So theoretically, I cannot really see where something goes wrong and I don't think much changes when removing mutex pairs in every update step.~~

EDIT: Doesn't your example already explain the likely issue? Your Datalog implementation uses the quadratic TC formulation and is thus less precise. Even when using the linear formulations, r;r+ and r+;r won't give equivalent results when considering mutexes (I think your example can be easily adapted to show this).
The results of all formulations are sound but different and so you have to be extra careful when comparing different approaches.
This also explains why our algorithm gives a different result when forcing it to do more iterations because it ends up computing the less precise quadratic FP result.
A case in point of why I suggested manually checking the results on litmus tests: I bet it is not hard to see that our results are correct, i.e., that the extra tuples of the Datalog version are indeed unnecessary.

DIvanov503 · 2024-08-17T06:18:14Z

EDIT: Doesn't your example already explain the likely issue?

Yes, that's why I wrote it.

I don't understand why r;r+ and r+;r are sound. Hernan showed me the quadratic definition r+;r+ in the paper. If left/right closures provide more precise results, then (r;r+)&(r+;r) should be even more precise. I can change the Datalog definition if you confirm that it's the case indeed.

EDIT: I mean r+=r+r&r+l where r+r=r|(r+r;r) and r+l=r|(r;r+l).

hernanponcedeleon · 2024-08-17T07:02:22Z

Hernan showed me the quadratic definition r+;r+ in the paper.

Keep in mind the definition I showed you from the SAS paper was about the encoding, not the RA.

The first datalog implementation was using r;r+ and this was giving us some "problems". However, I dont remember if this was a wrong verification result or just that the numbers of datalog and native did not match.

ThomasHaas · 2024-08-17T13:14:56Z

EDIT: Doesn't your example already explain the likely issue?

Yes, that's why I wrote it.

I don't understand why r;r+ and r+;r are sound. Hernan showed me the quadratic definition r+;r+ in the paper.

All those formulations have the same least fixed point if you do not consider mutex pairs (however, the linear versions are often nicer(*)). However, this fact seems to change if you add mutex pairs because r+;r+ allows for more ways to derive the same path, some of which may not get filtered by the simple intersection with the non-mutex relation.
The core reason is that during the computation, we often forget about intermediate events that lead from a to b and those intermediate events might be the one's that make the path infeasible due to mutual exclusion.
See my answer in #714 which explains why the result produced by the quadratic formulation is less precise on the given litmus test.

If left/right closures provide more precise results, then (r;r+)&(r+;r) should be even more precise. I can change the Datalog definition if you confirm that it's the case indeed.

EDIT: I mean r+=r+r&r+l where r+r=r|(r+r;r) and r+l=r|(r;r+l).

Yes, r+ = r | (r;r+ & r+;r) should give more precise results, but likely still not the most precise that is theoretically possible.
If you compute updates via r;r+ & r+;r you actually always have 4 events at hand in each iteration: (a, b) from the "left" r and (c, d) from the right r, giving you the option to check mutual exclusion between all those events (if any two of them are exclusive, the composition is invalid)!

(*) The various versions of the fixed point equation only coincide in their least fixed point, but not in their other fixed points.
For example, if r is acyclic then r+ = r | r;r+ (and the other linear version) has a unique fixed point, which means that you can forget about the "least"-property altogether (this is helpful for SMT encodings).
r = r | r+;r+ has always the full space as a valid solution, i.e., its greatest fixed point is everything no matter what r is (even if r is empty!).
Indeed, even r+ = r | r;r+ and r+ = r | r+;r have different (non-least) FPs on cyclic graphs. Consider, for example, a simple cycle and a special isolated/disconnected node. The former equation has a FP-solution where all nodes in the cycle have an edge towards the isolated node (these would not be in a least solution) whereas the latter has a solution where the isolated node has edges into the cycle (i.e., the opposite direction).

DIvanov503 · 2024-08-17T16:12:46Z

I think I start understanding it. Please correct me if I'm wrong. A precise transitive closure r+ over some base (input) relation r is a relation over endpoints of all chains over r which are possible from the execution point of view. Hence, the overapproximation (may) should exclude all chains containing mutually exclusive pairs. E.g., in my very first example, (1, 5) is not in the may set because its only corresponding chain contains mutually exclusive nodes. And the underapproximation (must) should additionally check for implication. If r is a "compound" (computed) relation, then it's inner structure (witness chains) should also be considered. Hence, r++ = r+ in the precise case.

The core reason is that during the computation, we often forget about intermediate events that lead from a to b and those intermediate events might be the one's that make the path infeasible due to mutual exclusion.

Because of that we have a freedom of choosing the equation which would restrict the may set most. At the same time, the must sets for all equations are the same because the execution implication is transitive. I.e., the quadratic definition does not spoil the must set with unsound tuples.

This is because our current implementation is somewhat awkwardly implemented: it performs update operations that are a hybrid between a linear TC formulation (r+;r) and the quadratic formulation (r+;r+).

If my understanding is correct, I suggest addressing this issue with the following adjustment:

EventGraph computeTransitiveClosure(oldOuter, inner, isMay) {
    outer = oldOuter
    current = (inner \ outer) ; outer
    while (current != 0) {
        outer = outer | current
        next = inner ; current
        current = next
    }
    return outer
}

It should compute the left closure (r ; r+) while reusing the previous result unless I overlooked something.

EDIT: The version above may have termination issues. current must get outer deducted. The revised version:

EventGraph computeTransitiveClosure(oldOuter, inner, isMay) {
    outer = oldOuter
    current = ((inner \ outer) ; outer) \ outer
    while (current != 0) {
        outer = outer | current
        current = (inner ; current) \ outer
    }
    return outer
}

ThomasHaas · 2024-08-17T16:51:00Z

I think I start understanding it. Please correct me if I'm wrong. A precise transitive closure r+ over some base (input) relation r is a relation over endpoints of all chains over r which are possible from the execution point of view. Hence, the overapproximation (may) should exclude all chains containing mutually exclusive pairs.

Yes, the endpoints of every path that contains some mutual exclusive events should not be counted. Getting this most precise result likely requires you to explicitly iterate over all possible paths, which is far too expensive. The fixed point iteration is less precise because you have to make the mutual exclusion judgement based only on a few events in each iteration rather than on the whole path.

Because of that we have a freedom of choosing the equation which would restrict the may set most. At the same time, the must sets for all equations are the same because the execution implication is transitive. I.e., the quadratic definition does not spoil the must set with unsound tuples.

Yes, the must-set might not be affected that much by the different FP formulation. However, I think this might be more related to the fact that the must-set is very likely to be acyclic. Indeed, the composition of must-sets is not associative so different composition orders of the various FP formulations already induce different results but IIRC this only happens on cyclic relations. The OOPSLA paper about RA mentions this problem and I make use of acyclicity in my proof IIRC.

If my understanding is correct, I suggest addressing this issue with the following adjustment:

EventGraph computeTransitiveClosure(oldOuter, inner, isMay) {
    outer = oldOuter
    current = (inner \ outer) ; outer
    while (current != 0) {
        outer = outer | current
        next = inner ; current
        current = next
    }
    return outer
}

~~I think this is not fully correct. I will check this part later (it's 2am here and I need to go to bed :P).~~
The following should be correct and can be implemeneted without the explicit set difference.

   update = inner \ oldOuter   // The first update is not a composition!
   outer = oldOuter | update
   while (update != 0) {
       update = inner;update \ outer    // or update;inner \ outer
       // ... remove mutual exclusive pairs
       outer = outer | update
   }

EDIT: Similar to must-set composition, I believe may-set composition with the mutex filter is not associative. This naturally explains why different fixed point formulations lead to different results.

DIvanov503 · 2024-08-19T06:07:09Z

   update = inner \ oldOuter   // The first update is not a composition!
   outer = oldOuter | update
   while (update != 0) {
       update = inner;update \ outer    // or update;inner \ outer
       // ... remove mutual exclusive pairs
       outer = outer | update
   }

I don't know what this version is supposed to compute but it is not precisely r;r+. Counter-example with empty mutex:

oldOuter = {(2, 3), (3, 4), (2, 4)}
inner = {(1, 2), (2, 3), (3, 4)}
update = inner \ oldOuter = {(1, 2)}
outer = oldOuter | update = {(2, 3), (3, 4), (2, 4), (1, 2)}
update = inner;update \ outer = 0

Since mutex is empty in this example, the algorithm must compute the "full" composition but the tuple (1, 4) is missing. However, the necessity of deducting outer from update was overlooked in my original suggestion, good catch.

EDIT: I was saving on lines of code and suggested an incorrect version again (inner \ oldOuter isn't added to outer). Here's a revised version:

EventGraph computeTransitiveClosure(oldOuter, inner, isMay) {
    outer = oldOuter
    update = inner \ outer
    update = update | ((update ; outer) \ outer)
    while (update != 0) {
        outer = outer | update
        update = (inner ; update) \ outer
    }
    return outer
}

ThomasHaas · 2024-08-19T06:23:05Z

You are right that my version is wrong. I missed the fact that in the FP equation r+ = r | r;r+ when r is changed to (r + d) you also get d;r+ into your initial update. I think your version considers this correctly.

ThomasHaas · 2024-08-24T05:39:12Z

This is an interesting one. The issue is that XRA may cause oldOuter to not be transitive anymore (since it can remove tuples), but the method computeTransitiveClosure works only correctly if oldOuter is transitive.
This causes the activeSet computation to fail computing a tuple that was removed by XRA but that is necessary for the encoding.

DIvanov503 · 2024-08-24T06:08:38Z

This causes the activeSet computation to fail computing a tuple that was removed by XRA but that is necessary for the encoding.

Would providing an empty oldOuter when computing active sets fix the issue?

ThomasHaas · 2024-08-24T18:23:33Z

This causes the activeSet computation to fail computing a tuple that was removed by XRA but that is necessary for the encoding.

Would providing an empty oldOuter when computing active sets fix the issue?

Yes, but I think it is better to make computeTransitiveClosure more robust and compute the initial update via inner + inner;oldOuter instead of inner + update;oldOuter. This should make it correct even for non-transitive underapproximations.

ThomasHaas

LGTM

Co-authored-by: Dmitry Ivanov <[email protected]>

Compute missing tuples in transitive closure

867f6cd

hernanponcedeleon changed the base branch from master to development August 16, 2024 12:32

Typo

437fe09

ThomasHaas mentioned this pull request Aug 17, 2024

[RA] Missing tuples in transitive closure #714

Closed

Compute left closure

e7de8cb

Use inner instead of update for the initial composition

72b0f64

ThomasHaas approved these changes Aug 26, 2024

View reviewed changes

hernanponcedeleon merged commit f6d10e2 into hernanponcedeleon:development Aug 26, 2024
1 check passed

hernanponcedeleon pushed a commit that referenced this pull request Aug 28, 2024

Compute missing tuples in transitive closure (#718)

c935f30

Co-authored-by: Dmitry Ivanov <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute missing tuples in transitive closure #718

Compute missing tuples in transitive closure #718

DIvanov503 commented Aug 16, 2024

ThomasHaas commented Aug 16, 2024

DIvanov503 commented Aug 16, 2024

DIvanov503 commented Aug 16, 2024

ThomasHaas commented Aug 17, 2024 •

edited

Loading

DIvanov503 commented Aug 17, 2024 •

edited

Loading

hernanponcedeleon commented Aug 17, 2024

ThomasHaas commented Aug 17, 2024 •

edited

Loading

DIvanov503 commented Aug 17, 2024 •

edited

Loading

ThomasHaas commented Aug 17, 2024 •

edited

Loading

DIvanov503 commented Aug 19, 2024 •

edited

Loading

ThomasHaas commented Aug 19, 2024

ThomasHaas commented Aug 24, 2024

DIvanov503 commented Aug 24, 2024

ThomasHaas commented Aug 24, 2024

ThomasHaas left a comment

Compute missing tuples in transitive closure #718

Compute missing tuples in transitive closure #718

Conversation

DIvanov503 commented Aug 16, 2024

ThomasHaas commented Aug 16, 2024

DIvanov503 commented Aug 16, 2024

DIvanov503 commented Aug 16, 2024

ThomasHaas commented Aug 17, 2024 • edited Loading

DIvanov503 commented Aug 17, 2024 • edited Loading

hernanponcedeleon commented Aug 17, 2024

ThomasHaas commented Aug 17, 2024 • edited Loading

DIvanov503 commented Aug 17, 2024 • edited Loading

ThomasHaas commented Aug 17, 2024 • edited Loading

DIvanov503 commented Aug 19, 2024 • edited Loading

ThomasHaas commented Aug 19, 2024

ThomasHaas commented Aug 24, 2024

DIvanov503 commented Aug 24, 2024

ThomasHaas commented Aug 24, 2024

ThomasHaas left a comment

Choose a reason for hiding this comment

ThomasHaas commented Aug 17, 2024 •

edited

Loading

DIvanov503 commented Aug 17, 2024 •

edited

Loading

ThomasHaas commented Aug 17, 2024 •

edited

Loading

DIvanov503 commented Aug 17, 2024 •

edited

Loading

ThomasHaas commented Aug 17, 2024 •

edited

Loading

DIvanov503 commented Aug 19, 2024 •

edited

Loading