efcore icon indicating copy to clipboard operation
efcore copied to clipboard

Consider doing split query on a reference navigation if the other side is a collection

Open roji opened this issue 3 years ago • 1 comments

We currently support split query on collection navigations:

_ = await ctx.Blogs.Include(b => b.Posts).AsSplitQuery().ToListAsync();
SELECT [b].[Id], [b].[Name]
FROM [Blogs] AS [b]
ORDER BY [b].[Id]

SELECT [p].[PostId], [p].[BlogId], [p].[Title], [b].[Id]
FROM [Blogs] AS [b]
INNER JOIN [Posts] AS [p] ON [b].[Id] = [p].[BlogId]
ORDER BY [b].[Id]

However, if the query is flipped around and starts with the Posts, we do not:

_ = await ctx.Posts.Include(p => p.Blog).AsSplitQuery().ToListAsync();
SELECT [p].[PostId], [p].[BlogId], [p].[Title], [b].[Id], [b].[Name]
FROM [Posts] AS [p]
LEFT JOIN [Blogs] AS [b] ON [p].[BlogId] = [b].[Id]

This is because the Post.Blog navigation is a reference navigation. However, the single query we produce has the same "cartesian explosion" effect that split queries were designed to solve: each Post row contains and duplicates the details of the Blog.

We could perform the above as a split query, doing so whenever either the navigation or its reverse navigation is a collection navigation.

roji avatar Sep 21 '22 09:09 roji

Consider that a common scenario would be to query for a single entity of the starting type.

For example: I want a specific blog with all its posts, so I start with Blogs, include Posts, and add a Where condition on the Blog ID. As a single query this would be multiple rows with duplicated details, and a split query may be beneficial.

But if I want a single post with details of its blog, I'd start with Posts, include Blog, and add a Where condition on the Post ID. As a single query this is a single row with no duplication. Currently, with split query enabled, it is also a single row with no duplication. This proposal would turn it into two queries.

I understand that this scenario isn't the only one but perhaps common enough to weigh up the regression it would introduce where split query is enabled by default?

stevendarby avatar Sep 21 '22 22:09 stevendarby

@stevendarby that's a good point; in theory we'd be able to know whether blog duplication may occur (since multiple posts are being fetched), and only split in those scenarios. Though I suspect that would be a pretty complex thing to do, and not necessarily worth it (given the below note).

Once #10878 is implemented, the two split queries would be batched in a single round-trip; at that point, the difference between a single query that returns a wide row and two queries that return the same data in two rows should become insignificant (we'd need to confirm this via benchmarking). If so, it seems worth it to avoid the data duplication in the case where it does occur, at the (likely negligible) price of transforming the single query into two batched split queries in the case where it doesn't.

roji avatar Sep 22 '22 09:09 roji