efcore icon indicating copy to clipboard operation
efcore copied to clipboard

Don't repeat joins when using multiple Selects in a custom projection on the same entity

Open samusaran opened this issue 1 year ago • 1 comments

Current situation

Currently I have an EF Query built like this:

        var query = GetOffersQuery(identities, loggedUser)
            .AsNoTracking();

        var dtoQuery = query.Select(x => new OfferDTO
            {
                Status = x.Status,
                CalculationDate = x.CalculationDate,
                CreatedDate = x.CreatedDate,
                ProductNames = x.Components.Select(c => c.ProductName),
                Quotations = x.Components.Select(c => c.Header.SalesDoc),
                BatteryTypes = x.Components.Select(c => c.BatteryType),
                ApproverName = x.Approver!.DisplayName,
                CustomerName = x.Customer.Name,
                Id = x.Id,
                SalesmanName = x.Salesman!.DisplayName,
                UploaderName = x.Uploader!.DisplayName,
                OpportunityNumber = x.OpportunityNumber,
                CountryCode = x.CountryCode
            })
            .AsSingleQuery();

Notice that ProductNames, Quotations and BatteryTypes all refers to the Components entity. The Components entity is alread Included inside GetOffersQuery.

This results in the following SQL Server query being generated:

      SELECT [t].[Status], [t].[CalculationDate], [t].[CreatedDate], [t].[Id], [u].[CountryCode], [u].[UserPrincipalName], [u0].[CountryCode], [u0].[UserPrincipalName], [u1].[CountryCode], [u1].[UserPrincipalName], [o1].[ProductName], [o1].[Id], [o2].[Header_SalesDoc], [o2].[Id], [o3].[BatteryType], [o3].[Id], [u].[DisplayName], [t].[Customer_Name], [u0].[DisplayName], [u1].[DisplayName], [t].[OpportunityNumber], [t].[CountryCode]
      FROM (
          SELECT TOP(@__p_1) [o].[Id], [o].[ApprovedBy], [o].[CalculationDate], [o].[CountryCode], [o].[CreatedBy], [o].[CreatedDate], [o].[OpportunityNumber], [o].[SalesEmployee], [o].[Status], [o].[Customer_Name]
          FROM [OfferHeader] AS [o]
          WHERE ([o].[CountryCode] IN (...) OR EXISTS (
              SELECT 1
              FROM [OfferGrantedUsers] AS [o0]
              WHERE [o0].[OfferId] = [o].[Id] AND [o0].[UserPrincipalName] = @__loggedUser_0)) AND [o].[Deleted] = CAST(0 AS bit)
          ORDER BY [o].[CalculationDate] DESC, [o].[Id] DESC
      ) AS [t]
      LEFT JOIN [UserHierarchyRoles] AS [u] ON [t].[CountryCode] = [u].[CountryCode] AND [t].[ApprovedBy] = [u].[UserPrincipalName]
      INNER JOIN [UserHierarchyRoles] AS [u0] ON [t].[CountryCode] = [u0].[CountryCode] AND [t].[SalesEmployee] = [u0].[UserPrincipalName]
      INNER JOIN [UserHierarchyRoles] AS [u1] ON [t].[CountryCode] = [u1].[CountryCode] AND [t].[CreatedBy] = [u1].[UserPrincipalName]
      LEFT JOIN [OfferComponent] AS [o1] ON [t].[Id] = [o1].[OfferId]
      LEFT JOIN [OfferComponent] AS [o2] ON [t].[Id] = [o2].[OfferId]
      LEFT JOIN [OfferComponent] AS [o3] ON [t].[Id] = [o3].[OfferId]
      ORDER BY [t].[CalculationDate] DESC, [t].[Id] DESC, [u].[CountryCode], [u].[UserPrincipalName], [u0].[CountryCode], [u0].[UserPrincipalName], [u1].[CountryCode], [u1].[UserPrincipalName], [o1].[Id], [o2].[Id]

Notice that OfferComponent table gets added once for every Select. In fact, if I remove one or more Selects inside the projection, the number of joins adjust accordingly.

Expected result

I would expect EF (or SqlClient) to only join the table once, since all 3 Selects are based on the same entity. The query I expect is the following:

      SELECT [t].[Status], [t].[CalculationDate], [t].[CreatedDate], [t].[Id], [u].[CountryCode], [u].[UserPrincipalName], [u0].[CountryCode], [u0].[UserPrincipalName], [u1].[CountryCode], [u1].[UserPrincipalName], [o1].[ProductName], [o1].[Id], [o1].[Header_SalesDoc], [o1].[BatteryType], [u].[DisplayName], [t].[Customer_Name], [u0].[DisplayName], [u1].[DisplayName], [t].[OpportunityNumber], [t].[CountryCode]
      FROM (
          SELECT TOP(@__p_1) [o].[Id], [o].[ApprovedBy], [o].[CalculationDate], [o].[CountryCode], [o].[CreatedBy], [o].[CreatedDate], [o].[OpportunityNumber], [o].[SalesEmployee], [o].[Status], [o].[Customer_Name]
          FROM [OfferHeader] AS [o]
          WHERE ([o].[CountryCode] IN (...) OR EXISTS (
              SELECT 1
              FROM [OfferGrantedUsers] AS [o0]
              WHERE [o0].[OfferId] = [o].[Id] AND [o0].[UserPrincipalName] = @__loggedUser_0)) AND [o].[Deleted] = CAST(0 AS bit)
          ORDER BY [o].[CalculationDate] DESC, [o].[Id] DESC
      ) AS [t]
      LEFT JOIN [UserHierarchyRoles] AS [u] ON [t].[CountryCode] = [u].[CountryCode] AND [t].[ApprovedBy] = [u].[UserPrincipalName]
      INNER JOIN [UserHierarchyRoles] AS [u0] ON [t].[CountryCode] = [u0].[CountryCode] AND [t].[SalesEmployee] = [u0].[UserPrincipalName]
      INNER JOIN [UserHierarchyRoles] AS [u1] ON [t].[CountryCode] = [u1].[CountryCode] AND [t].[CreatedBy] = [u1].[UserPrincipalName]
      LEFT JOIN [OfferComponent] AS [o1] ON [t].[Id] = [o1].[OfferId]
      ORDER BY [t].[CalculationDate] DESC, [t].[Id] DESC, [u].[CountryCode], [u].[UserPrincipalName], [u0].[CountryCode], [u0].[UserPrincipalName], [u1].[CountryCode], [u1].[UserPrincipalName], [o1].[Id]

This can benefit by reducing the number of reduntant joins.

I know that one way to fix it would be to project the 3 properties as a single DTO with 3 properties, to enumerate Components once, but this changes the output object definition (and in my case the objective of the DTO projection is exactly to have a flat object)

All tests have been done on:

  • EF Core 8.0.4
  • Microsoft.Data.SqlClient 5.1.5

samusaran avatar Apr 24 '24 14:04 samusaran

Makes sense, we should indeed improve this; putting in the backlog for now. We may want to do this after getting rid of the nav expansion step, as this is likely to involve various work there.

As a minimal repro, for a LINQ query such as the following:

_ = await context.Blogs
    .Include(b => b.Posts)
    .Select(b => new
    {
        PostIds = b.Posts.Select(p => p.Id),
        PostTitles = b.Posts.Select(p => p.Title),
        PostRatings = b.Posts.Select(p => p.Rating)
    })
    .ToListAsync();

... we get the following SQL:

SELECT [b].[Id], [p].[Id], [p0].[Title], [p0].[Id], [p1].[Rating], [p1].[Id]
FROM [Blogs] AS [b]
LEFT JOIN [Post] AS [p] ON [b].[Id] = [p].[BlogId]
LEFT JOIN [Post] AS [p0] ON [b].[Id] = [p0].[BlogId]
LEFT JOIN [Post] AS [p1] ON [b].[Id] = [p1].[BlogId]
ORDER BY [b].[Id], [p].[Id], [p0].[Id]
Full minimal repro
await using var context = new BlogContext();
await context.Database.EnsureDeletedAsync();
await context.Database.EnsureCreatedAsync();

_ = await context.Blogs
    .Include(b => b.Posts)
    .Select(b => new
    {
        PostIds = b.Posts.Select(p => p.Id),
        PostTitles = b.Posts.Select(p => p.Title),
        PostRatings = b.Posts.Select(p => p.Rating)
    })
    .ToListAsync();

public class BlogContext : DbContext
{
    public DbSet<Blog> Blogs { get; set; }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();
}

public class Blog
{
    public int Id { get; set; }
    public string Name { get; set; }

    public List<Post> Posts { get; set; }
}

public class Post
{
    public int Id { get; set; }
    public string Title { get; set; }
    public int Rating { get; set; }

    public Blog Blog { get; set; }
}

roji avatar May 02 '24 09:05 roji

Another example that can produce multiple joins:

Valuation = deal.Valuation == null ? null : deal.Valuation.ToDTO(),
MarketInfo = deal.Valuation == null ? null : deal.Valuation.ToMarketInfoDTO()

Accessing deal.Valuation four times produces four left joins. I fixed this in the project by doing the mapping on the client side.

GKotfis avatar Jun 18 '24 17:06 GKotfis