hydra icon indicating copy to clipboard operation
hydra copied to clipboard

Seq scan for janitor login/consent flows cleanup SQL query

Open David-Wobrock opened this issue 1 year ago • 1 comments

Preflight checklist

Ory Network Project

No response

Describe the bug

The Hydra janitor is running this query to cleanup inactive login/consent requests:

SELECT login_challenge
FROM hydra_oauth2_flow
WHERE (
    (state != 6)
        OR (login_error IS NOT NULL AND login_error <> '{}' AND login_error <> '')
        OR (consent_error IS NOT NULL AND consent_error <> '{}' AND consent_error <> '')
    )
  AND requested_at < '2024-08-25 08:00:00'
  AND nid = '326c59d4-e93c-47fc-96d4-40c5522f17c1'
ORDER BY login_challenge
    LIMIT 50000;

(with some example values). See https://github.com/ory/hydra/blob/v2.2.0/persistence/sql/persister_consent.go#L732-L792

This is running a sequential query on PostgreSQL:

 Limit  (cost=2552485.01..2552485.01 rows=1 width=33)
   ->  Sort  (cost=2552485.01..2552485.01 rows=1 width=33)
         Sort Key: login_challenge
         ->  Gather  (cost=1000.00..2552485.00 rows=1 width=33)
               Workers Planned: 2
               ->  Parallel Seq Scan on hydra_oauth2_flow  (cost=0.00..2551484.90 rows=1 width=33)
                     Filter: ((requested_at < '2024-08-25 08:00:00'::timestamp without time zone) AND (nid = '326c59d4-e93c-47fc-96d4-40c5522f17c1'::uuid) AND ((state <> 6) OR ((login_error IS NOT NULL) AND (login_error <> '{}'::text) AND (login_error <> ''::text)) OR ((consent_error IS NOT NULL) AND (consent_error <> '{}'::text) AND (consent_error <> ''::text))))
(7 rows)

On a ~18 million row database, this query can take between 15 and 40 seconds, depending on the load.

The expected behaviour to have an index for this query by default in Hydra.

Reproducing the bug

Run the janitor SQL above.

Relevant log output

No response

Relevant configuration

No response

Version

v2.2.0

On which operating system are you observing this issue?

None

In which environment are you deploying?

Kubernetes with Helm

Additional Context

We tried indexing manually requested_at or state, but since the query has many AND/ORs, PostgreSQL 15.5 is not able to pick up the index properly and keeps running the sequential scan.

David-Wobrock avatar Aug 26 '24 15:08 David-Wobrock

Just ran in to this bug as well. We are on CockroachDB.

It recommends the following:

CREATE INDEX ON hydra_oauth2_flow (nid, state) STORING (requested_at, login_error, consent_error);
CREATE INDEX ON hydra_oauth2_flow (nid, consent_error) STORING (requested_at, state, login_error);
CREATE INDEX ON hydra_oauth2_flow (nid, login_error) STORING (requested_at, state, consent_error);

ptescher avatar Apr 11 '25 21:04 ptescher