gource as a reference for an evolutionary dataset generation framework from codebases
hi maintainers, i'm working on an independent research project which revolves very closely around software evolution in industry-grade (large scale, multi-repository, well documented) codebases and my primary approach is to capture the evolution of a codebase over time and create an open-source data creation framework that could be connected with any codebase and captures the evolution of the project over time (checkpointed codegraphs, maybe checkpointed by code commits?). primarily a python or java codebase (why? better ast parsers and libraries available) to start with, other language supports could be added eventually
I need help in understanding how, and from what all components/modules in Gource i can draw parallels. also, i'll be happy to read other suggestions or references.
at this point i'm surveying and capturing a few solid approaches to achieve the same.