oss-directory icon indicating copy to clipboard operation
oss-directory copied to clipboard

Python library for `fetchData`

Open ryscheng opened this issue 2 years ago • 3 comments

It'd be nice to also publish to pip something similar to what we did with the npm library here. Just fetch the latest changes into dictionaries/lists that you can use in Python

See here for reference https://github.com/opensource-observer/oss-directory/blob/main/src/fetchData.ts

ryscheng avatar Aug 09 '23 16:08 ryscheng

Curious if instead of generating a library for each language, you could package the data into a CSV (after each commit with GitHub actions) and publish it somewhere (GitHub, S3, IPFS, ...) so it is curlable / downloadable.

Then in Python, getting the data could be pd.read_csv(oso.io/directory.csv) or something like that.

davidgasquez avatar Oct 11 '23 09:10 davidgasquez

Yes! That's a great idea. I'm curious if you have any suggestions for how to encode the nested project schema? https://github.com/opensource-observer/oss-directory/blob/main/src/resources/schema/project.json

Would you just create the equivalent relational tables to flatten it?

ryscheng avatar Oct 13 '23 14:10 ryscheng

I'm curious if you have any suggestions for how to encode the nested project schema?

I'd start with the simplest thing; serialize any nested dicts/arrays to strings. That might not be the most user friendly but since is a CSV, it could be updated later on with other alternatives (e.g: flattening the keys, relational approach, ...).

That said, the CSV is just a format idea. If the data is not too big (e.g:30-100MB), perhaps the best thing would be a larger JSONNL file (that could be compressed too) instead of CSV.

davidgasquez avatar Oct 17 '23 07:10 davidgasquez

https://pypi.org/project/oss-directory/

ryscheng avatar Jun 25 '24 20:06 ryscheng