Tracking: refactor Contributor List graph filter queries
Tracking:
- [x]
most-active-contributorsendpoint: https://github.com/open-sauced/api/pull/756 - [ ]
contributions-by-projectendpoint: https://github.com/open-sauced/api/pull/760
The "active", "alum", and "new" contributor filters on the contributor repositories are not performing as well as we'd like.
Looking at the timescale data, these queries over times of high load have a mean execution time of 800 seconds. In the Timescale metrics data, they show up as:
WITH "CTE" AS (SELECT DISTINCT users.login AS login FROM (SELECT DISTINCT
lower(actor_login) AS login FROM pull_request_github_events pull_request_github_events W
HERE lower(actor_login) IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14,
$15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31,
$32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48,
$49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65,
$66, $67, $68)) users) SELECT login FROM "CTE" "CTE
This is a query that is used in the various filters for "active", "new", and "alum" contributors:
https://github.com/open-sauced/api/blob/5f9897c1024747820387c744f3dd0abfdb9b704f/src/user-lists/user-list-events-stats.service.ts#L611-L622
I also noticed the other queries in the service do not perform much better:
https://github.com/open-sauced/api/blob/5f9897c1024747820387c744f3dd0abfdb9b704f/src/user-lists/user-list-events-stats.service.ts#L42-L58
Here, as the base query, we do a huge UNION ALL which iterates many tables to get a huge swath of data. This can end up being very expensive on the database.
Relates to:
- https://github.com/open-sauced/api/issues/625
- https://github.com/open-sauced/api/issues/679
And I'd probably recommend we find a different approach here. Or maybe we try to refactor this and hoist it out of the user list service and into the timescale service to be more performant?