`setQueryData` is slow with large datasets
Describe the bug
setQueryData iterates over all entries in the cache to find matches. This leads to performance issues when priming large datasets.
Your minimal, reproducible example
https://codesandbox.io/p/sandbox/react-query-setquerydata-complexity-xl4d3t
Steps to reproduce
const myData = range(1, 10000) // eg. a list of search responses received from the server
for (const i of myData) {
queryClient.setQueryData(i, i)
}
Internally, setQueryData uses find, which iterates over each entry in the cache. Assuming the cache has already been populated, the above function quadratic.
The issue can be avoided by getting the Query-Object directly from the hash-table:
// this returns the cache for this key if existing, or creates a new one with default options:
// Unlike `setQueryData`, it will use the hashtable to find the entry.
const query = queryClient.getQueryCache().build(client, { queryKey: /* ... */ })
query.setData(data)
Expected behavior
I'd suggest that setQueryData should either...
- be updated to only work with exact matches, since I believe most use-cases will only want to update a single entry
- be provided with an
exact-filter option that uses the hash-table internally
How often does this bug happen?
Every time
Screenshots or Videos
Platform
MacOS, Chrome 23.0.6262.5
Tanstack Query adapter
vanilla
TanStack Query version
5.17.19
TypeScript version
No response
Additional context
This behaviour has also been observed by users in https://github.com/TanStack/query/issues/1633. However, it was previously believed to be due to structural sharing.
good find, thanks.
The issue can be avoided by getting the Query-Object directly from the hash-table:
yes, but we cannot call .build before we've executed the functionalUpdate, because it can return undefined, and in that case we don't want to create the cache entry.
I think this diff should fix it though:
index 5cc764fd..18516ccc 100644
--- a/packages/query-core/src/queryClient.ts
+++ b/packages/query-core/src/queryClient.ts
@@ -165,14 +165,6 @@ export class QueryClient {
>,
options?: SetDataOptions,
): TInferredQueryFnData | undefined {
- const query = this.#queryCache.find<TInferredQueryFnData>({ queryKey })
- const prevData = query?.state.data
- const data = functionalUpdate(updater, prevData)
-
- if (typeof data === 'undefined') {
- return undefined
- }
-
const defaultedOptions = this.defaultQueryOptions<
any,
any,
@@ -181,6 +173,16 @@ export class QueryClient {
QueryKey
>({ queryKey })
+ const query = this.#queryCache.get<TInferredQueryFnData>(
+ defaultedOptions.queryHash,
+ )
+ const prevData = query?.state.data
+ const data = functionalUpdate(updater, prevData)
+
+ if (typeof data === 'undefined') {
+ return undefined
+ }
+
return this.#queryCache
.build(this, defaultedOptions)
.setData(data, { ...options, manual: true })
do you want to maybe contribute a PR with some tests ?
do you want to maybe contribute a PR with some tests ?
Yes I would be happy to.
I'm a bit unsure on how to best test this. Do you think, measuringperformance.now() with an arbitrary threshold of 500ms would suffice?
not sure how to best test this tbh
I think there is no problem writing the test by measuring an arbitrary threshold using performance.now(). This PR on @tanstack/table did this kind of test https://github.com/TanStack/table/pull/4495
I've tried the fix and test locally and it works fine.
Before fix with tests
FAIL |@tanstack/query-core| src/tests/queryClient.test.tsx > queryClient > setQueryData > should set 10k data in less than 500ms
AssertionError: expected 29102.459416 to be less than 500
❯ src/tests/queryClient.test.tsx:371:27
369| const end = performance.now()
370| expect(queryClient.getQueryData([...key, totalRun])).toBe(totalRun)
371| expect(end - start).toBeLessThan(500)
| ^
372| })
373| })
After fix with tests
Test Files 17 passed (17)
Tests 318 passed (318)
Type Errors no errors
Start at 02:24:05
Duration 3.28s
wdyt? @TkDodo @SebKranz
Nice 👍
I would love to help with the PR, I can just push my current changes @TkDodo
@milhamm sounds good, thanks.
It looks like there are actually three functions that call find when they usually don't have to: setQueryData getQueryData and getQueryState.
I was therefore tempted to move the Optimisation directly into QueryCache.find, but that is impossible since we don't have access to the hashFn there.
@TkDodo, you're using the default hashFn from the QueryClient. Wouldn't that run into a subtle bug if the user attempts to set data for a Query that was created with a custom hashFn like so?:
useQuery({
// ...
queryKeyHashFn: () => ...
})
@SebKranz Correct me if I'm wrong, but I think that's the behavior I normally would expect.
I expect the queryKeyHashFn called from useQuery would not override the client queryKeyHasFn function unless they set it when instantiating the initial query client with new QueryClient() or using the queryClient.setDefaultOptions.
But I could be wrong tho.
@milhamm No, I agree. TBH I don't even see the use-case for setting the hashFn on a per-query basis in the first place.
However, currently find computes the query hash by taking the hashFn from each individual Query. I just wanted to make sure that you're okay with changing that behaviour.
Fair point, it would have a subtle behavior change @SebKranz
@TkDodo, you're using the default hashFn from the QueryClient. Wouldn't that run into a subtle bug if the user attempts to set data for a Query that was created with a custom hashFn like so?
inside find, we are using matchQuery, which will match the passed queryKey against all queries. In doing so, it will hash the passed queryKey with each hashFn of each query. So if two queries have a different hashFn, both will be used on the passed key to produce a match.
now technically, it would be good if find would accept { queryKey, queryKeyHashFn }, but that isn't a valid filter. So I think the current behaviour could be slightly off if you indeed have a custom hashFn for key A, and then try to match A against B and C, because A will be hashed with the hashFns from B and C. In practice, nobody has done that in 4 years :)