Limit on repos for an org? Or bug?
I tried reading stats for an org with many repos. The output stops after a chunk of repos. I haven't looked deeper into it. Is this a well-known limit, a bug in the traffic API or with the Python code?
Hi Henrik, It looks like either the API may be undergoing some changes as the REST API v3 is migrated to GraphQL v4: "Warning: The API may change without advance notice during the preview period." [1]
Or, also likely is that the current implementation doesn't handle pagination [2]. To be honest, I didn't think that this script would be used for organizations with hundreds of repos! In fact, getting organization's repos was added by another user in a relatively recent PR [3].
[1] https://developer.github.com/v3/repos/ [2] https://developer.github.com/v3/#pagination [3] https://github.com/nchah/github-traffic-stats/pull/8
Thank you, it seems that missing pagination support is causing it.
I hit the pagination issue too. I have around 180 repos and I'm only seeing stats for the first 30.
eg. gts 'mcauser' 'ALL' 'save_csv'
When requesting /user/repos there is a Link response header which shows the next page:
curl -i -H 'Authorization: token mytoken' https://api.github.com/user/repos
HTTP/1.1 200 OK
Link: <https://api.github.com/user/repos?page=2>; rel="next", <https://api.github.com/user/repos?page=12>; rel="last"
And after requesting the 2nd page:
curl -i -H 'Authorization: token mytoken' https://api.github.com/user/repos?page=2
HTTP/1.1 200 OK
Link: <https://api.github.com/user/repos?page=1>; rel="prev", <https://api.github.com/user/repos?page=3>; rel="next", <https://api.github.com/user/repos?page=12>; rel="last", <https://api.github.com/user/repos?page=1>; rel="first"
And after requesting the last page:
curl -i -H 'Authorization: token mytoken' https://api.github.com/user/repos?page=12
HTTP/1.1 200 OK
Link: <https://api.github.com/user/repos?page=11>; rel="prev", <https://api.github.com/user/repos?page=1>; rel="first"
Seems the fix where repo == 'ALL' is to extract the rel="next" url from the Link header and repeatedly call send_request() to collect all repo names, then loop over each.
Edit: You can increase the default 30 per page to 100 with &per_page=100
Also, it seems the spiderman-preview header isn't required anymore.
https://github.com/nchah/github-traffic-stats/blob/master/gts/main.py#L248
https://developer.github.com/changes/2016-08-15-traffic-api-preview/
Thanks all for documenting this issue. I've pushed some changes that get all of the hundreds of repos owned by organizations like 'IBM', 'Google', etc. To get the actual traffic stats for those repos, the user running gts needs to have push access to those repos so I'm not personally able to get that data.