Query validation caching
Are you willing to provide a PR for this issue or aid in developing it?
Yes.
What problem does this feature proposal attempt to solve?
Query validation takes a lot of time and resources (about 30-40% of processing time). However, a typical website uses a limited amount of queries that are compiled on the client. So we may cache the result of query validation, and identical queries will be processed much faster.
Which possible solutions should be considered?
There are only two hard things in Computer Science: cache invalidation and naming things. --Phil Karlton
In order to cache things, we have to figure out when to invalidate this cache. The validation is performed in the webonyx/graphql-php GraphQL::promiseToExecute()
Here is what it depends on:
- Schema
- Query
- Validation rules
The first two are presented as simple AST. The schema is already cached, and I opened https://github.com/nuwave/lighthouse/pull/2017 to cache queries. So we can easily realize when to invalidate them.
The latter is more difficult. Rules are php classes that can't be easily tracked. They can also depend on any other data or code. For example, the QueryComplexity validator depends on all passed query variables.
I guess we can add a function getHash() to the ProvidesValidationRules interface. It should return the hash of all the data it depends on. So if anything is changed, we will be able to check it and invalidate the cache. For the default lighthouse ValidationRulesProvider it should return the hash of corresponding config values + query variables if the QueryComplexity validator is enabled.
Given these 3 hashes, we can store the result of validation and be assured it has not been changed.
I guess we should also extract the validation method in the webonyx/graphql-php GraphQL::promiseToExecute() to execute it independently.
Do you have any objections or suggestions? This may improve Lighthouse performance significantly.
Unless you have somebody DDOS'ing your server, I would expect invalid queries not to be repeated often. Thus, we could just cache queries that have executed successfuly, and then just skip validation entirely when they are queried again.
I agree that in normal circumstances there should not be many queries with validation errors. However it's easy to implement (we'll store an array of validation errors and it'll be empty for validated queries).
Also I can imagine where this might be helpful. For example, old clients might spam old requests that are not validated against a new schema. Also a lot of DoS attacks are very dumb and reuse the same query.
Today I'm trying to optimize tests and found that query validation takes ~30% of the time :(

Look like we can pass empty validation rules into GraphQLBase::executeQuery() for known valid queries (by overriding $this->providesValidationRules). But I'm not sure how safe is it?
https://github.com/nuwave/lighthouse/blob/8f3b9faa33a4136682df7ba335c3344cee8a06eb/src/GraphQL.php#L262-L271
I think @k0ka has outlined quite nicely what would be necessary to safely recognize when validation for a query should lead to identical results. If we get this part right, we can basically extract the validation from the base GraphQL library and run it separately, caching its result: https://github.com/webonyx/graphql-php/blob/7a78690e99548de8155f605772eecf615329ee16/src/GraphQL.php#L130-L143
I've done a quick test and seems caching will not give any improvements for (our) tests :( The query and variables are almost always different -> new hash -> cache miss. For real application, the result probably will be similar or negligible because variables are usually different between requests.
Seems the only way is to separate all rules into two categories:
-
Dependent from query variables - they should be run always, it will also solve the hash calculation problem for
UploadedFile(my variant usejson_encodeand it fails in this case). -
Independent - only one time for each query
The variables influence the result, but they are not relevant for basic document validation. See https://github.com/webonyx/graphql-php/blob/7a78690e99548de8155f605772eecf615329ee16/src/GraphQL.php#L143 - variables are not passed to DocumentValidator::validate().
As I see they are passed into QueryComplexity ?
https://github.com/webonyx/graphql-php/blob/7a78690e99548de8155f605772eecf615329ee16/src/GraphQL.php#L130-L141
As I see they are passed into QueryComplexity ?
Right. I think we can check if lighthouse.security.max_query_complexity is enabled, and include the variables in the hash based on that. We can document that for optimal performance of the validation cache this rule needs to be disabled.
I think we can check if lighthouse.security.max_query_complexity is enabled, and include the variables in the hash based on that.
Or just run QueryComplexity rule always (implementation will be simpler)
I've found another edge case: in our application introspection is available only for user with specific permission. It doesn't work with cached validation :) Looks like the setting with "run always" rules will be useful in any case.