Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New OpenJSON expression to SQL translation breaks aggregates on subquery with 'Cannot perform an aggregate function on an expression containing an aggregate or a subquery' exception #32374

Closed
AlexeiScherbakov opened this issue Nov 21, 2023 · 9 comments · Fixed by #32478
Assignees
Labels
area-primitive-collections area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression Servicing-approved type-bug
Milestone

Comments

@AlexeiScherbakov
Copy link

File a bug

Include your code

Steps to reproduce:

  1. Use attached example OpenJsonBug.zip
  2. Create db '''OpenJsonBugDb''' and table (no data is needed to reproduce) on local sql server
CREATE TABLE [dbo].[Sales](
	[DateID] [int] NOT NULL,
	[JOURNALID] [int] NOT NULL,
	[FiltersService] [int] NOT NULL,
	[OrdersGrossPCS] [decimal](18, 4) NULL,
	[OrdersApprovedPCS] [decimal](18, 4) NULL,
	[OrdersNetPCS] [decimal](18, 4) NULL,
	[StatusNameLine] [varchar](9) NULL
) ON [PRIMARY]
  1. TestHashSetTranslation completes normally, TestArrayTranslation fails with Microsoft.Data.SqlClient.SqlException : Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
[Test]
public void TestHashSetTranslation()
{
	using var dbContext = _webApplication.Services.GetRequiredService<SalesDbContext>();

    HashSet<string> shippedStatuses = new HashSet<string>()
    {
        "Delivered",
        "Shipped",
        "Net"
    };

    int intStart = 20230101;
    int intEnd = 20231121;

    var data = dbContext.Sales.Where(x => (x.DateId >= intStart) && (x.DateId <= intEnd) && (x.FiltersService == 0))
        .GroupBy(x => x.JournalId)
        .Select(x => new
        {
            Key = x.Key,
            NetInOrder = x.Where(x => (x.OrdersNetPCS > 0)).Count(),
            ShippedInOrder = x.Where(x =>
                   ((x.OrdersNetPCS > 0)
                   || ((x.OrdersApprovedPCS > 0) && shippedStatuses.Contains(x.StatusNameLine)))
                              ).Count(),
            ApprovedInOrder = x.Where(x => (x.OrdersApprovedPCS > 0)).Count(),
            GrossInOrder = x.Where(x => (x.OrdersGrossPCS > 0)).Count(),
        }).GroupBy(x => true)
        .Select(x => new
        {
            Net = x.Count(x => x.NetInOrder > 0),
            Shipped = x.Count(x => x.ShippedInOrder > 0),
            Approved = x.Count(x => x.ApprovedInOrder > 0),
            Gross = x.Count(x => x.GrossInOrder > 0)
        }).SingleOrDefault();
}

[Test]
public void TestArrayTranslation()
{
    using var dbContext = _webApplication.Services.GetRequiredService<SalesDbContext>();

    string[] shippedStatuses = new string[]
    {
        "Delivered",
        "Shipped",
        "Net"
    };

    int intStart = 20230101;
    int intEnd = 20231121;

    var data = dbContext.Sales.Where(x => (x.DateId >= intStart) && (x.DateId <= intEnd) && (x.FiltersService == 0))
        .GroupBy(x => x.JournalId)
        .Select(x => new
        {
            Key = x.Key,
            NetInOrder = x.Where(x => (x.OrdersNetPCS > 0)).Count(),
            ShippedInOrder = x.Where(x =>
                   ((x.OrdersNetPCS > 0)
                   || ((x.OrdersApprovedPCS > 0) && shippedStatuses.Contains(x.StatusNameLine)))
                              ).Count(),
            ApprovedInOrder = x.Where(x => (x.OrdersApprovedPCS > 0)).Count(),
            GrossInOrder = x.Where(x => (x.OrdersGrossPCS > 0)).Count(),
        }).GroupBy(x => true)
        .Select(x => new
        {
            Net = x.Count(x => x.NetInOrder > 0),
            Shipped = x.Count(x => x.ShippedInOrder > 0),
            Approved = x.Count(x => x.ApprovedInOrder > 0),
            Gross = x.Count(x => x.GrossInOrder > 0)
        }).SingleOrDefault();
}

In versions prior to 8.0.0 each query completes normally. In 8.0.0 array is translating to OpenJSON and query cannot be executed

Include provider and version information

EF Core version: 8.0.0
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET 8.0
HelpLink.ProdName: Microsoft SQL Server
HelpLink.ProdVer: 15.00.2000

@ajcvickers
Copy link
Member

@AlexeiScherbakov Make sure you have the compatibility level for your database set appropriately. See Contains in LINQ queries may stop working on older SQL Server versions

@AlexeiScherbakov
Copy link
Author

I use 15.0 version of SQL Server with compatibility level 150
In my test environment SELECT name, compatibility_level FROM sys.databases; returns

master	150
tempdb	150
model	150
msdb	150
OpenJsonBugDb	150

Problem is that inside subquery with nested aggregate OPENJSON cannot be evaluated by SQL Server, but IN instruction - can. It means that thanslator cannot use OPENJSON everywhere

Problem is only when Contains is used in nested expression

For example this is working normally:

[Test]
public void TestJustCountTranslation()
{
    using var dbContext = _webApplication.Services.GetRequiredService<SalesDbContext>();

    string[] shippedStatuses = new string[]
    {
        "Delivered",
        "Shipped",
        "Net"
    };

    int intStart = 20230101;
    int intEnd = 20231121;

    var data = dbContext.Sales.Where(x => (x.DateId >= intStart) && (x.DateId <= intEnd) && (x.FiltersService == 0)
            && shippedStatuses.Contains(x.StatusNameLine))
        .Count();
}

[Test]
public void TestJustCount2Translation()
{
    using var dbContext = _webApplication.Services.GetRequiredService<SalesDbContext>();

    HashSet<string> shippedStatuses = new HashSet<string>()
    {
        "Delivered",
        "Shipped",
        "Net"
    };

    int intStart = 20230101;
    int intEnd = 20231121;

    var data = dbContext.Sales.Where(x => (x.DateId >= intStart) && (x.DateId <= intEnd) && (x.FiltersService == 0)
            && shippedStatuses.Contains(x.StatusNameLine))
        .Count();
}

@roji
Copy link
Member

roji commented Nov 21, 2023

Thanks @AlexeiScherbakov, I'll investigate this.

@roji roji changed the title New OpenJSON expression to SQL translation breaks argregates on subquery with 'Cannot perform an aggregate function on an expression containing an aggregate or a subquery' exception New OpenJSON expression to SQL translation breaks aggregates on subquery with 'Cannot perform an aggregate function on an expression containing an aggregate or a subquery' exception Nov 26, 2023
@roji
Copy link
Member

roji commented Nov 27, 2023

Confirmed, this is indeed a limitation of SQL Server which we've run up against in the past (e.g. #26141).

Minimal repro
await using var ctx = new BlogContext();
await ctx.Database.EnsureDeletedAsync();
await ctx.Database.EnsureCreatedAsync();

var shippedStatuses = new[] { "Delivered", "Shipped", "Net" };

_ = ctx.Sales
    .GroupBy(x => x.JournalId)
    .Select(x => new
    {
        x.Key,
        ShippedInOrder = x.Count(x => shippedStatuses.Contains(x.StatusNameLine)),
    })
    .SingleOrDefault();

public class BlogContext : DbContext
{
    public DbSet<SalesItem> Sales { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();
}

public class SalesItem
{
    public int Id { get; set; }
    public int JournalId { get; set; }
    public string StatusNameLine { get; set; }
}

Basically it boils down to the following:

CREATE TABLE Blogs
(
    Id INT PRIMARY KEY IDENTITY,
    Owner NVARCHAR(MAX),
    Tag NVARCHAR(MAX)
);

-- The following works (approximates our old translation):
SELECT Owner, COUNT(CASE WHEN Tag IN ('foo', 'bar') THEN 1 END) FROM Blogs GROUP BY Owner;

-- The following fails (approximates our new translation):
SELECT Owner, COUNT(CASE WHEN Tag IN (SELECT value FROM OPENJSON('["foo", "bar"]')) THEN 1 END) FROM Blogs GROUP BY Owner;

This works fine on PostgreSQL, SQLite and MariaDB, SQL Server is the only one not supporting this.

One workaround for this kind of case is to pull the filter out of the aggregate projection and have it as a WHERE clause (which gets evaluated before GROUP BY):

_ = ctx.Sales
    .Where(x => shippedStatuses.Contains(x.StatusNameLine))
    .GroupBy(x => x.JournalId)
    .Select(x => new
    {
        x.Key,
        ShippedInOrder = x.Count(),
    })
    .SingleOrDefault();

Importantly, this pre-filters out any non-matching rows, and so doesn't produce the same results - the original query returned rows for all Journals, whereas this one doesn't return rows for Journals with no matching Orders. However, if this is acceptable, it will work and may also be faster.

To mitigate this, we could detect when we're inside a aggregate expression, and refrain from translating Contains to OPENJSON for that specific case (revert to the old behavior). Note that this would still fail if the user tries anything other than a bare Contains operator on a parameterized collection, so it would be a very narrow workaround to prevent the Contains regression.

@AlexeiScherbakov
Copy link
Author

AlexeiScherbakov commented Nov 27, 2023

Unfortunately this is just a minimal example of problem. Real queries are more complex - sales funnels calculations logic is "mirrored" from financial analysts.
Today we have 2 logics with HashSet and Array which are translated to IN and OPENJSON.
I see 3 ways:

  1. One-to-one matching (HashSet -> IN , Array -> OPENJSON or vice versa - don't know which matching is good)
  2. Some kind switch to current context (disable OPENJSON translation until switch is removed)
ctx.SetSwitch("openjson_translation","off");
await ctx...
ctx.RemoveSwitch("openjson_translation")
  1. Query extension method
ctx.Sales.WithoutOpenJsonTranslation().Select(...)

P.S. Current version accidentally implements 1 - and it is ok for me (I can choose what kind of translation I need with changing array<->HashSet. But it is classic Bug/Feature.

@roji
Copy link
Member

roji commented Nov 27, 2023

If you're looking for a global opt-out from using OPENJSON (option 2), you can simply set the SQL Server compatibility level to a low value, as detailed in the breaking changes notes. In effect you'd be telling EF that it's targeting an old SQL Server which doesn't support OPENJSON (this doesn't currently affect other things).

For a per-query solution (3), I've submitted #32412 which adds EF.Constant, allowing integrating the array as a constant into the query tree; you would wrap your array parameter with EF.Constant (instead of WithoutOpenJsonTranslation), and this would produce the classical IN syntax (see #32394 (comment) for more info).

@roji
Copy link
Member

roji commented Dec 4, 2023

Reopening to consider servicing.

@roji roji reopened this Dec 4, 2023
roji added a commit to roji/efcore that referenced this issue Dec 7, 2023
roji added a commit to roji/efcore that referenced this issue Dec 7, 2023
roji added a commit to roji/efcore that referenced this issue Dec 7, 2023
@bjss-colin
Copy link

We've also encountered this issue, during a .NET 8 upgrade, in a case very similar to the minimal repro whereby we're grouping and then using Contains. Having profiled the query, we noticed that OPENJSON was being used.

For now we've rolled back the .NET 8 upgrade what with the upcoming festive annual leave.

@knopa
Copy link

knopa commented Dec 14, 2023

I have got same with Sum + LinqKit expression which build query depends on arrays

@ajcvickers ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-primitive-collections area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported regression Servicing-approved type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants