Python library for `fetchData`
It'd be nice to also publish to pip something similar to what we did with the npm library here. Just fetch the latest changes into dictionaries/lists that you can use in Python
See here for reference https://github.com/opensource-observer/oss-directory/blob/main/src/fetchData.ts
Curious if instead of generating a library for each language, you could package the data into a CSV (after each commit with GitHub actions) and publish it somewhere (GitHub, S3, IPFS, ...) so it is curlable / downloadable.
Then in Python, getting the data could be pd.read_csv(oso.io/directory.csv) or something like that.
Yes! That's a great idea. I'm curious if you have any suggestions for how to encode the nested project schema? https://github.com/opensource-observer/oss-directory/blob/main/src/resources/schema/project.json
Would you just create the equivalent relational tables to flatten it?
I'm curious if you have any suggestions for how to encode the nested project schema?
I'd start with the simplest thing; serialize any nested dicts/arrays to strings. That might not be the most user friendly but since is a CSV, it could be updated later on with other alternatives (e.g: flattening the keys, relational approach, ...).
That said, the CSV is just a format idea. If the data is not too big (e.g:30-100MB), perhaps the best thing would be a larger JSONNL file (that could be compressed too) instead of CSV.
https://pypi.org/project/oss-directory/