starfish icon indicating copy to clipboard operation
starfish copied to clipboard

Make fetching and filtering more performant

Open danisyellis opened this issue 6 years ago • 4 comments

(NOTE: I made this ticket a while ago, but performance/ amount of time Starfish takes to run has never been a problem for us at Indeed. I think a company would have to be checking an extremely large number of employees for this to matter. So I'm going to put this in the backlog. If someone wants to work on it, great, but I don't think it's a particularly useful change at the moment.)

Currently We:

  1. ping the github API for a person's events
  2. Look at the first page and keep only the events that are event types we care about
  3. Do this for every page of event history that Github has (they hold up to 300 events at a time, 10 per page)
  4. Now, we look through that array of events to see if one is in the correct time period, and stop looking when we find one.

However, there's no reason to look through all 30 pages of a person's events if an event on the first page meets both criteria

So, refactor the code to check for

  1. event type
  2. if it happened in the time range BEFORE fetching the next page. If those are both true, log the contributor's alternate id and move on to the next person.

danisyellis avatar Jun 11 '19 01:06 danisyellis

Also consider storing the e-tag returned for a person, so you can not retrieve results if there's been no change.

See also - conditional requests: https://developer.github.com/v3/#conditional-requests

jeffwilcox avatar Mar 12 '20 19:03 jeffwilcox

That's a good idea Jeff. I think I'll make a new issue for that to make it more visible. Do you have a recommendation between using e-tags or the Last-Modified header?

I also saw in one of your other comments that you're using octokit, and I'm curious what advantage using octokit gives you over plain api calls?

danisyellis avatar May 08 '20 21:05 danisyellis

We chose the e-tag and never looked back, so I don't have any useful guidance!

I want to say that Octokit has given us better long-term support, but we've still had to deal with a number of breaking changes over the years. So I'm going to put it in the "trying to help our GitHub friends build and bake this great library" as my reason.

There are also some plug-ins already created on retry policies, abuse limiters, etc., that may save time for some.

jeffwilcox avatar May 08 '20 22:05 jeffwilcox

Thanks!

danisyellis avatar May 08 '20 23:05 danisyellis