Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute missing tuples in transitive closure #718

Merged
merged 4 commits into from
Aug 26, 2024

Conversation

DIvanov503
Copy link
Contributor

No description provided.

@hernanponcedeleon hernanponcedeleon changed the base branch from master to development August 16, 2024 12:32
@ThomasHaas
Copy link
Collaborator

I would still like to know what goes wrong in the old implementation. I don't think you ever need to do both pre and post composition to get the correct results. All of the following fixed point equations should be correct (i.e. have the same LFP):

r+ = r | r;r+
r+ = r | r+;r
r+ = r | r+;r | r;r+
r+ = r | r+;r+

I assume that this remains correct, even when avoiding composition of mutually exclusive pairs.
That being said, have you tried the original implementation with a slight change: (i) ignore mutual exclusion for composition, (ii) remove mutual exclusive pairs only from the final result?
If this gives the same result to the Datalog implementation then there is a problem with the intermediate usage of mutual exclusion.

@DIvanov503
Copy link
Contributor Author

If mutual exclusion is used in composition, then the last equation has the largest fixed point among them. Example:

r = {(1, 2), (2, 3), (3, 4), (4, 5)}
mutex = {(2, 5), (1, 4)}

The tuple (1, 5) will be present only in the last case through (1, 3) ; (3, 5) but not in the former ones because both (1, 2) and (4, 5) have their right/left r+ counterparts prohibited by mutex. Without mutex the four equations are equivalent. With the approach you proposed we should get more tuples in some cases. Imagine if in this example mutex was {(2, 5), (1, 4), (1, 3), (3, 5)}. Then the closure without mutex after being filtered would still give us (1, 5) but there would be no pair of tuples in the resulting relation which would compose into (1, 5).

The old implementation produces an incorrect result in the following example:

(a, b) in oldOuter
(b, c) in inner

The old implementation initializes current with inner and outer with oldOuter | inner. During the iteration current will contain only "new" tuples. Hence, the only chance for (a, c) to be inferred, is to have outer ; current to be computed in the loop but it does only current ; outer instead.

@DIvanov503
Copy link
Contributor Author

I suspect that there's either something small overlooked or my assumption about oldOuter being a correct transitive closure of some subset of inner is wrong.

@ThomasHaas
Copy link
Collaborator

ThomasHaas commented Aug 17, 2024

If mutual exclusion is used in composition, then the last equation has the largest fixed point among them. Example:

r = {(1, 2), (2, 3), (3, 4), (4, 5)}
mutex = {(2, 5), (1, 4)}

The tuple (1, 5) will be present only in the last case through (1, 3) ; (3, 5) but not in the former ones because both (1, 2) and (4, 5) have their right/left r+ counterparts prohibited by mutex.

Interesting example.

With the approach you proposed we should get more tuples in some cases. Imagine if in this example mutex was {(2, 5), (1, 4), (1, 3), (3, 5)}. Then the closure without mutex after being filtered would still give us (1, 5) but there wouldbe no pair of tuples in the resulting relation which would compose into (1, 5).

Yes, that's why I suggested it. IIRC, the Datalog implementation gave larger results and I wanted to match those by making this computation larger as well. But I take from your reply that this gives results larger than what Datalog produces.

The old implementation produces an incorrect result in the following example:

(a, b) in oldOuter
(b, c) in inner

The old implementation initializes current with inner and outer with oldOuter | inner. During the iteration current will contain only "new" tuples. Hence, the only chance for (a, c) to be inferred, is to have outer ; current to be computed in the loop but it does only current ; outer instead.

This should not happen since you would start with a wrong approximation of the old transitive closure here.

tc(r) = r | r;tc(r)   -->  tc(r + d) = (r + d) | (r + d);tc(r + d)

Now I believe that inner = (r + d) and oldOuter = tc(r) + d (which is a valid underapproximation for the monotonic iteration scheme). Therefore, we have the following FP equation which we can iterate starting from oldOuter:

tc(inner) = inner | inner;tc(inner)

In particular, the iterations should thus be of the shape inner;updatedEdges or updatedEdges;inner if using a different FP formulation. From what I can see, the current algorithm has current = updatedEdges and uses the second update mechanism. Also, instead of using just inner for the composition, it uses the larger outer (which is not wrong, but unnecessary/less precise I think cause it roughly amounts to the quadratic tc equation).
So theoretically, I cannot really see where something goes wrong and I don't think much changes when removing mutex pairs in every update step.

EDIT: Doesn't your example already explain the likely issue? Your Datalog implementation uses the quadratic TC formulation and is thus less precise. Even when using the linear formulations, r;r+ and r+;r won't give equivalent results when considering mutexes (I think your example can be easily adapted to show this).
The results of all formulations are sound but different and so you have to be extra careful when comparing different approaches.
This also explains why our algorithm gives a different result when forcing it to do more iterations because it ends up computing the less precise quadratic FP result.
A case in point of why I suggested manually checking the results on litmus tests: I bet it is not hard to see that our results are correct, i.e., that the extra tuples of the Datalog version are indeed unnecessary.

@DIvanov503
Copy link
Contributor Author

DIvanov503 commented Aug 17, 2024

EDIT: Doesn't your example already explain the likely issue?

Yes, that's why I wrote it.

I don't understand why r;r+ and r+;r are sound. Hernan showed me the quadratic definition r+;r+ in the paper. If left/right closures provide more precise results, then (r;r+)&(r+;r) should be even more precise. I can change the Datalog definition if you confirm that it's the case indeed.

EDIT: I mean r+=r+r&r+l where r+r=r|(r+r;r) and r+l=r|(r;r+l).

@hernanponcedeleon
Copy link
Owner

Hernan showed me the quadratic definition r+;r+ in the paper.

Keep in mind the definition I showed you from the SAS paper was about the encoding, not the RA.

The first datalog implementation was using r;r+ and this was giving us some "problems". However, I dont remember if this was a wrong verification result or just that the numbers of datalog and native did not match.

@ThomasHaas
Copy link
Collaborator

ThomasHaas commented Aug 17, 2024

EDIT: Doesn't your example already explain the likely issue?

Yes, that's why I wrote it.

I don't understand why r;r+ and r+;r are sound. Hernan showed me the quadratic definition r+;r+ in the paper.

All those formulations have the same least fixed point if you do not consider mutex pairs (however, the linear versions are often nicer(*)). However, this fact seems to change if you add mutex pairs because r+;r+ allows for more ways to derive the same path, some of which may not get filtered by the simple intersection with the non-mutex relation.
The core reason is that during the computation, we often forget about intermediate events that lead from a to b and those intermediate events might be the one's that make the path infeasible due to mutual exclusion.
See my answer in #714 which explains why the result produced by the quadratic formulation is less precise on the given litmus test.

If left/right closures provide more precise results, then (r;r+)&(r+;r) should be even more precise. I can change the Datalog definition if you confirm that it's the case indeed.

EDIT: I mean r+=r+r&r+l where r+r=r|(r+r;r) and r+l=r|(r;r+l).

Yes, r+ = r | (r;r+ & r+;r) should give more precise results, but likely still not the most precise that is theoretically possible.
If you compute updates via r;r+ & r+;r you actually always have 4 events at hand in each iteration: (a, b) from the "left" r and (c, d) from the right r, giving you the option to check mutual exclusion between all those events (if any two of them are exclusive, the composition is invalid)!

(*) The various versions of the fixed point equation only coincide in their least fixed point, but not in their other fixed points.
For example, if r is acyclic then r+ = r | r;r+ (and the other linear version) has a unique fixed point, which means that you can forget about the "least"-property altogether (this is helpful for SMT encodings).
r = r | r+;r+ has always the full space as a valid solution, i.e., its greatest fixed point is everything no matter what r is (even if r is empty!).
Indeed, even r+ = r | r;r+ and r+ = r | r+;r have different (non-least) FPs on cyclic graphs. Consider, for example, a simple cycle and a special isolated/disconnected node. The former equation has a FP-solution where all nodes in the cycle have an edge towards the isolated node (these would not be in a least solution) whereas the latter has a solution where the isolated node has edges into the cycle (i.e., the opposite direction).

@DIvanov503
Copy link
Contributor Author

DIvanov503 commented Aug 17, 2024

I think I start understanding it. Please correct me if I'm wrong. A precise transitive closure r+ over some base (input) relation r is a relation over endpoints of all chains over r which are possible from the execution point of view. Hence, the overapproximation (may) should exclude all chains containing mutually exclusive pairs. E.g., in my very first example, (1, 5) is not in the may set because its only corresponding chain contains mutually exclusive nodes. And the underapproximation (must) should additionally check for implication. If r is a "compound" (computed) relation, then it's inner structure (witness chains) should also be considered. Hence, r++ = r+ in the precise case.

The core reason is that during the computation, we often forget about intermediate events that lead from a to b and those intermediate events might be the one's that make the path infeasible due to mutual exclusion.

Because of that we have a freedom of choosing the equation which would restrict the may set most. At the same time, the must sets for all equations are the same because the execution implication is transitive. I.e., the quadratic definition does not spoil the must set with unsound tuples.

This is because our current implementation is somewhat awkwardly implemented: it performs update operations that are a hybrid between a linear TC formulation (r+;r) and the quadratic formulation (r+;r+).

If my understanding is correct, I suggest addressing this issue with the following adjustment:

EventGraph computeTransitiveClosure(oldOuter, inner, isMay) {
    outer = oldOuter
    current = (inner \ outer) ; outer
    while (current != 0) {
        outer = outer | current
        next = inner ; current
        current = next
    }
    return outer
}

It should compute the left closure (r ; r+) while reusing the previous result unless I overlooked something.

EDIT: The version above may have termination issues. current must get outer deducted. The revised version:

EventGraph computeTransitiveClosure(oldOuter, inner, isMay) {
    outer = oldOuter
    current = ((inner \ outer) ; outer) \ outer
    while (current != 0) {
        outer = outer | current
        current = (inner ; current) \ outer
    }
    return outer
}

@ThomasHaas
Copy link
Collaborator

ThomasHaas commented Aug 17, 2024

I think I start understanding it. Please correct me if I'm wrong. A precise transitive closure r+ over some base (input) relation r is a relation over endpoints of all chains over r which are possible from the execution point of view. Hence, the overapproximation (may) should exclude all chains containing mutually exclusive pairs.

Yes, the endpoints of every path that contains some mutual exclusive events should not be counted. Getting this most precise result likely requires you to explicitly iterate over all possible paths, which is far too expensive. The fixed point iteration is less precise because you have to make the mutual exclusion judgement based only on a few events in each iteration rather than on the whole path.

Because of that we have a freedom of choosing the equation which would restrict the may set most. At the same time, the must sets for all equations are the same because the execution implication is transitive. I.e., the quadratic definition does not spoil the must set with unsound tuples.

Yes, the must-set might not be affected that much by the different FP formulation. However, I think this might be more related to the fact that the must-set is very likely to be acyclic. Indeed, the composition of must-sets is not associative so different composition orders of the various FP formulations already induce different results but IIRC this only happens on cyclic relations. The OOPSLA paper about RA mentions this problem and I make use of acyclicity in my proof IIRC.

If my understanding is correct, I suggest addressing this issue with the following adjustment:

EventGraph computeTransitiveClosure(oldOuter, inner, isMay) {
    outer = oldOuter
    current = (inner \ outer) ; outer
    while (current != 0) {
        outer = outer | current
        next = inner ; current
        current = next
    }
    return outer
}

I think this is not fully correct. I will check this part later (it's 2am here and I need to go to bed :P).
The following should be correct and can be implemeneted without the explicit set difference.

   update = inner \ oldOuter   // The first update is not a composition!
   outer = oldOuter | update
   while (update != 0) {
       update = inner;update \ outer    // or update;inner \ outer
       // ... remove mutual exclusive pairs
       outer = outer | update
   }

EDIT: Similar to must-set composition, I believe may-set composition with the mutex filter is not associative. This naturally explains why different fixed point formulations lead to different results.

@DIvanov503
Copy link
Contributor Author

DIvanov503 commented Aug 19, 2024

   update = inner \ oldOuter   // The first update is not a composition!
   outer = oldOuter | update
   while (update != 0) {
       update = inner;update \ outer    // or update;inner \ outer
       // ... remove mutual exclusive pairs
       outer = outer | update
   }

I don't know what this version is supposed to compute but it is not precisely r;r+. Counter-example with empty mutex:

oldOuter = {(2, 3), (3, 4), (2, 4)}
inner = {(1, 2), (2, 3), (3, 4)}
update = inner \ oldOuter = {(1, 2)}
outer = oldOuter | update = {(2, 3), (3, 4), (2, 4), (1, 2)}
update = inner;update \ outer = 0

Since mutex is empty in this example, the algorithm must compute the "full" composition but the tuple (1, 4) is missing. However, the necessity of deducting outer from update was overlooked in my original suggestion, good catch.

EDIT: I was saving on lines of code and suggested an incorrect version again (inner \ oldOuter isn't added to outer). Here's a revised version:

EventGraph computeTransitiveClosure(oldOuter, inner, isMay) {
    outer = oldOuter
    update = inner \ outer
    update = update | ((update ; outer) \ outer)
    while (update != 0) {
        outer = outer | update
        update = (inner ; update) \ outer
    }
    return outer
}

@ThomasHaas
Copy link
Collaborator

You are right that my version is wrong. I missed the fact that in the FP equation r+ = r | r;r+ when r is changed to (r + d) you also get d;r+ into your initial update. I think your version considers this correctly.

@ThomasHaas
Copy link
Collaborator

This is an interesting one. The issue is that XRA may cause oldOuter to not be transitive anymore (since it can remove tuples), but the method computeTransitiveClosure works only correctly if oldOuter is transitive.
This causes the activeSet computation to fail computing a tuple that was removed by XRA but that is necessary for the encoding.

@DIvanov503
Copy link
Contributor Author

This causes the activeSet computation to fail computing a tuple that was removed by XRA but that is necessary for the encoding.

Would providing an empty oldOuter when computing active sets fix the issue?

@ThomasHaas
Copy link
Collaborator

This causes the activeSet computation to fail computing a tuple that was removed by XRA but that is necessary for the encoding.

Would providing an empty oldOuter when computing active sets fix the issue?

Yes, but I think it is better to make computeTransitiveClosure more robust and compute the initial update via inner + inner;oldOuter instead of inner + update;oldOuter. This should make it correct even for non-transitive underapproximations.

Copy link
Collaborator

@ThomasHaas ThomasHaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hernanponcedeleon hernanponcedeleon merged commit f6d10e2 into hernanponcedeleon:development Aug 26, 2024
1 check passed
hernanponcedeleon pushed a commit that referenced this pull request Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants