api icon indicating copy to clipboard operation
api copied to clipboard

Tracking: refactor Contributor List graph filter queries

Open jpmcb opened this issue 1 year ago • 0 comments

Tracking:

  • [x] most-active-contributors endpoint: https://github.com/open-sauced/api/pull/756
  • [ ] contributions-by-project endpoint: https://github.com/open-sauced/api/pull/760

The "active", "alum", and "new" contributor filters on the contributor repositories are not performing as well as we'd like.

Looking at the timescale data, these queries over times of high load have a mean execution time of 800 seconds. In the Timescale metrics data, they show up as:

WITH "CTE" AS (SELECT DISTINCT users.login AS login FROM (SELECT DISTINCT
 lower(actor_login) AS login FROM pull_request_github_events pull_request_github_events W
HERE lower(actor_login) IN ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14,
 $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, 
$32, $33, $34, $35, $36, $37, $38, $39, $40, $41, $42, $43, $44, $45, $46, $47, $48,
 $49, $50, $51, $52, $53, $54, $55, $56, $57, $58, $59, $60, $61, $62, $63, $64, $65,
 $66, $67, $68)) users) SELECT login FROM "CTE" "CTE

This is a query that is used in the various filters for "active", "new", and "alum" contributors:

https://github.com/open-sauced/api/blob/5f9897c1024747820387c744f3dd0abfdb9b704f/src/user-lists/user-list-events-stats.service.ts#L611-L622

I also noticed the other queries in the service do not perform much better:

https://github.com/open-sauced/api/blob/5f9897c1024747820387c744f3dd0abfdb9b704f/src/user-lists/user-list-events-stats.service.ts#L42-L58

Here, as the base query, we do a huge UNION ALL which iterates many tables to get a huge swath of data. This can end up being very expensive on the database.


Relates to:

  • https://github.com/open-sauced/api/issues/625
  • https://github.com/open-sauced/api/issues/679

And I'd probably recommend we find a different approach here. Or maybe we try to refactor this and hoist it out of the user list service and into the timescale service to be more performant?

jpmcb avatar Mar 29 '24 14:03 jpmcb