Caching issue/PR data
Sometimes it's useful to store issues / PRs / etc if you want to analyze them later. This wouldn't be useful for generating changelogs (since you want to make sure you've got the latest activity for those) but it could be useful for generating datasets that one can analyze with, e.g., https://github.com/choldgraf/jupyter-activity-snapshot.
Perhaps this could keep a cache folder in ~/data_github_activity that would keep this data over time. A few points / questions:
- It could either be a single CSV files for all the data, a couple CSV files for different types of data (e.g.,
issues.csv,prs.csv,comments.csv), or sub-folders for different github orgs/repos - When new data is downloaded, it could do simple joins on these CSV files and then drop the duplicates based on the unique ID of that item
@consideRatio what do you think about this? Useful or unnecessary complexity?
Hmmm, i dont want to influence you much on this as i represent a very specific need about changelog generation mainly, but i think its not out of scope for the github-activity project to allow for output to csv or json etc that are more suitable to process from disk than a markdown file.
I can imagine we could do some nice things from this. Perhaps putting out systematic metrics for releases that could be fun to look at between the projects etc.
How long time since last release, how many prs, how large prs, how many people contributed, was that an increase or decrease, etc etc, assuming we start to analyze data more.