Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify AND and OR #34133

Merged
merged 3 commits into from
Jul 2, 2024
Merged

Simplify AND and OR #34133

merged 3 commits into from
Jul 2, 2024

Conversation

ranma42
Copy link
Contributor

@ranma42 ranma42 commented Jul 1, 2024

When constructing AndAlso and OrElse expressions, perform some basic local simplifications.

Comment on lines -1847 to -1848
// a is null || a is not null -> true
// a is null && a is not null -> false
Copy link
Contributor Author

@ranma42 ranma42 Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ these two cases (the ones that return a constant) are currently not implemented in the SqlExpressionFactory methods.

It is unclear to me whether they are valuable; it would be easy to keep them (to ensure no regression), so let me know what you think

EDIT: these are now implemented

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be possible to check for a stronger condition, i.e. a || !a -> true
If a is a generic expression, that would possibly involve comparing a whole subtree, so it might cause complexity issues (that could be worked around in several ways, but probably belong to a separate discussion/PR).

@ranma42
Copy link
Contributor Author

ranma42 commented Jul 1, 2024

This is, in a sense, the continuation of #34002 for AND and OR.

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff as always @ranma42. See below for a small optimization refactor suggestion - though I'd be OK merging as-is too if you're against it.

&& sqlBinaryExpression.OperatorType is ExpressionType.AndAlso or ExpressionType.OrElse
? SimplifyLogicalSqlBinaryExpression(sqlBinaryResult)
&& sqlBinaryResult.OperatorType is ExpressionType.AndAlso or ExpressionType.OrElse
? _sqlExpressionFactory.MakeBinary( // invoke MakeBinary simplifications
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One slight disadvantage here is that this always reconstructs the SqlBinaryExpression (allocation), regardless of whether it's simplified or not. We could refactor a bit and have a SimplifyBinary method on SqlExpressionFactory that simply returns the original expression if it couldn't be optimized (reusing the actual code internally from MakeBinary().

It might be a bit of a micro-optimization, but it might be worth doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is something I am hitting with the branch on top of this (you might guess... it's NOT 😅 (and it combines much better than I expected with this branch 🤯)).
I will further investigate options to support an Update-like pattern with no (additional) allocations.

In the meantime, I might try a middle-ground solution, i.e. creating the new expression and if it is .Equals to the old one, propagate the old one. AFAICT this would:

  • if the new expression is different
    • require an additional comparison vs the current code
    • do the same operations as the desired pattern (compare subexpressions, then allocate if needed)
  • if the new expression matches the old one
    • require an additional comparison vs the current code; OTOH it would avoid recursively regenerating all of the parents
    • require an additional allocation vs the desired pattern

This might not yet be ideal, but it should be a simple change that makes the code a little closer to the target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. We generally don't stress too much about compilation perf, but it would be good to avoid allocating/copying every single binary expression in the tree (which can't be simplified). What I had in mind was something the following on SqlExpressionFactory:

SqlExpression MakeBinary(ExpressionType operatorType, SqlExpression left, SqlExpression right, SqlExpression? existingBinary)

Coming from MakeBinary, existingBinary would be null so the method constructs a new SqlBinaryExpression if no simplification is possible. If, however, an existingBinary is passed and left/right can't be simplified, existingBinary is simply returned.

@ranma42 ranma42 force-pushed the simplify-and-or branch 2 times, most recently from 7ca2915 to 6ed0471 Compare July 2, 2024 18:37
@ranma42 ranma42 marked this pull request as draft July 2, 2024 18:45
@ranma42 ranma42 marked this pull request as draft July 2, 2024 18:45
@ranma42 ranma42 marked this pull request as draft July 2, 2024 18:45
@ranma42
Copy link
Contributor Author

ranma42 commented Jul 2, 2024

I would like to eagerly return the existing expression as in 2be205d, but that does not cover the case where the expression has been Updated.
I'll try and look more into how to handle this (maybe instead of Update we could use MakeBinary 🤔 )

@ranma42 ranma42 marked this pull request as ready for review July 2, 2024 19:53
When constructing `AndAlso` and `OrElse` expressions, perform some basic local
simplifications.
Its simplifications are already performed by the `SqlExpressionFactory`.
@ranma42
Copy link
Contributor Author

ranma42 commented Jul 2, 2024

I am not yet 100% happy, but this should be fine:

  • it avoids the allocations (using basically the pattern suggested by @roji)
  • it can optimize Updated expression
    Unfortunately it does not pre-emptively simplify upon Update, but I guess that's ok (at the very least, it is not a regression 😅 ).

@ranma42 ranma42 mentioned this pull request Jul 2, 2024
Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good!

@roji roji enabled auto-merge (squash) July 2, 2024 20:38
@roji roji merged commit 4649fb3 into dotnet:main Jul 2, 2024
7 checks passed
@ranma42 ranma42 deleted the simplify-and-or branch July 2, 2024 21:52
// x && x -> x
if (left is SqlConstantExpression { Value: false }
|| right is SqlConstantExpression { Value: true }
|| left.Equals(right))
Copy link
Member

@roji roji Jul 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranma42 I thought this a bit more over the night.

It's worth considering that left.Equals(right) is a recursive visitation, which is very different from simple pattern matching like checking for false/true constants. In a contrived case where 100 AndAlso expression are being composed over each either, the number of visitations grows very quickly, and this can lead to a compilation performance issue (I say this is contrived, though in some cases where query trees are generated automatically it may happen).

I don't think this is necessarily something we need to worry about right now - SQL expression trees are typically not that deep, and IIUC the problem also existed before when the optimization happened as part of SqlNullabilitryProcessor. But it's something to consider; for example, SqlExpression could cache its hash code when it's first calculated, and our recursive equality logic could simply check that first, and return early rather than recursing.

Let me know if you have any thoughts on this - we can open a separate issue to track this work (which again, I don't think is urgent).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this problem is already present in a few places in the codebase and it is something I would definitely like to improve (and use more 😈 ).
Your suggestion (compute & reuse the hash code) is indeed the most straightforward way to improve it... it would be trivial to implement if the expressions were immutable 😉 #32927.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted #34149 to track this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be trivial to implement if the expressions were immutable

FWIW almost our entire SQL tree is already immutable, with the big exception being SelectExpression (and that's also supposed to be immutable once translation is complete, i.e. in post-processing - though I wouldn't commit to it 100%...). So we should be able to implement the has code optimization today for almost all expression types, and specifically skip SelectExpression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SelectExpression can be a subtree of most expressions (it is used in InExpression) :(
Maybe this is something we want to do on an IR and possibly skip/ignore on C# and SQL expressions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, you're right. I don't think we need to wait on an IR here though - #32927 (full immutability) is something I want us to do for many other reasons, regardless/before an IR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants