github-traffic-stats icon indicating copy to clipboard operation
github-traffic-stats copied to clipboard

Limit on repos for an org? Or bug?

Open data-henrik opened this issue 8 years ago • 4 comments

I tried reading stats for an org with many repos. The output stops after a chunk of repos. I haven't looked deeper into it. Is this a well-known limit, a bug in the traffic API or with the Python code?

data-henrik avatar Feb 19 '18 11:02 data-henrik

Hi Henrik, It looks like either the API may be undergoing some changes as the REST API v3 is migrated to GraphQL v4: "Warning: The API may change without advance notice during the preview period." [1]

Or, also likely is that the current implementation doesn't handle pagination [2]. To be honest, I didn't think that this script would be used for organizations with hundreds of repos! In fact, getting organization's repos was added by another user in a relatively recent PR [3].

[1] https://developer.github.com/v3/repos/ [2] https://developer.github.com/v3/#pagination [3] https://github.com/nchah/github-traffic-stats/pull/8

nchah avatar Feb 21 '18 13:02 nchah

Thank you, it seems that missing pagination support is causing it.

data-henrik avatar Feb 21 '18 14:02 data-henrik

I hit the pagination issue too. I have around 180 repos and I'm only seeing stats for the first 30. eg. gts 'mcauser' 'ALL' 'save_csv'

When requesting /user/repos there is a Link response header which shows the next page:

curl -i -H 'Authorization: token mytoken' https://api.github.com/user/repos

HTTP/1.1 200 OK
Link: <https://api.github.com/user/repos?page=2>; rel="next", <https://api.github.com/user/repos?page=12>; rel="last"

And after requesting the 2nd page:

curl -i -H 'Authorization: token mytoken' https://api.github.com/user/repos?page=2

HTTP/1.1 200 OK
Link: <https://api.github.com/user/repos?page=1>; rel="prev", <https://api.github.com/user/repos?page=3>; rel="next", <https://api.github.com/user/repos?page=12>; rel="last", <https://api.github.com/user/repos?page=1>; rel="first"

And after requesting the last page:

curl -i -H 'Authorization: token mytoken' https://api.github.com/user/repos?page=12

HTTP/1.1 200 OK
Link: <https://api.github.com/user/repos?page=11>; rel="prev", <https://api.github.com/user/repos?page=1>; rel="first"

Seems the fix where repo == 'ALL' is to extract the rel="next" url from the Link header and repeatedly call send_request() to collect all repo names, then loop over each.

Edit: You can increase the default 30 per page to 100 with &per_page=100

Also, it seems the spiderman-preview header isn't required anymore. https://github.com/nchah/github-traffic-stats/blob/master/gts/main.py#L248 https://developer.github.com/changes/2016-08-15-traffic-api-preview/

mcauser avatar Mar 29 '18 01:03 mcauser

Thanks all for documenting this issue. I've pushed some changes that get all of the hundreds of repos owned by organizations like 'IBM', 'Google', etc. To get the actual traffic stats for those repos, the user running gts needs to have push access to those repos so I'm not personally able to get that data.

nchah avatar Apr 07 '18 03:04 nchah