Seq scan for janitor login/consent flows cleanup SQL query
Preflight checklist
- [X] I could not find a solution in the existing issues, docs, nor discussions.
- [X] I agree to follow this project's Code of Conduct.
- [X] I have read and am following this repository's Contribution Guidelines.
- [X] I have joined the Ory Community Slack.
- [X] I am signed up to the Ory Security Patch Newsletter.
Ory Network Project
No response
Describe the bug
The Hydra janitor is running this query to cleanup inactive login/consent requests:
SELECT login_challenge
FROM hydra_oauth2_flow
WHERE (
(state != 6)
OR (login_error IS NOT NULL AND login_error <> '{}' AND login_error <> '')
OR (consent_error IS NOT NULL AND consent_error <> '{}' AND consent_error <> '')
)
AND requested_at < '2024-08-25 08:00:00'
AND nid = '326c59d4-e93c-47fc-96d4-40c5522f17c1'
ORDER BY login_challenge
LIMIT 50000;
(with some example values). See https://github.com/ory/hydra/blob/v2.2.0/persistence/sql/persister_consent.go#L732-L792
This is running a sequential query on PostgreSQL:
Limit (cost=2552485.01..2552485.01 rows=1 width=33)
-> Sort (cost=2552485.01..2552485.01 rows=1 width=33)
Sort Key: login_challenge
-> Gather (cost=1000.00..2552485.00 rows=1 width=33)
Workers Planned: 2
-> Parallel Seq Scan on hydra_oauth2_flow (cost=0.00..2551484.90 rows=1 width=33)
Filter: ((requested_at < '2024-08-25 08:00:00'::timestamp without time zone) AND (nid = '326c59d4-e93c-47fc-96d4-40c5522f17c1'::uuid) AND ((state <> 6) OR ((login_error IS NOT NULL) AND (login_error <> '{}'::text) AND (login_error <> ''::text)) OR ((consent_error IS NOT NULL) AND (consent_error <> '{}'::text) AND (consent_error <> ''::text))))
(7 rows)
On a ~18 million row database, this query can take between 15 and 40 seconds, depending on the load.
The expected behaviour to have an index for this query by default in Hydra.
Reproducing the bug
Run the janitor SQL above.
Relevant log output
No response
Relevant configuration
No response
Version
v2.2.0
On which operating system are you observing this issue?
None
In which environment are you deploying?
Kubernetes with Helm
Additional Context
We tried indexing manually requested_at or state, but since the query has many AND/ORs, PostgreSQL 15.5 is not able to pick up the index properly and keeps running the sequential scan.
Just ran in to this bug as well. We are on CockroachDB.
It recommends the following:
CREATE INDEX ON hydra_oauth2_flow (nid, state) STORING (requested_at, login_error, consent_error);
CREATE INDEX ON hydra_oauth2_flow (nid, consent_error) STORING (requested_at, state, login_error);
CREATE INDEX ON hydra_oauth2_flow (nid, login_error) STORING (requested_at, state, consent_error);