Code-Pile
Code-Pile copied to clipboard
Mailing Lists
Mailing Lists
Dataset URL -
Does the dataset exists in a scraped format ? No
Description
In general. (Almost) every programmer uses a programming language, huge swathes of programming are organized around these languages Most of these languages have some kind of package manager This package manager usually has download statistics
Procedure
- [ ] Determine the top 50-100 programming languages as shown by GitHub statistics or whatever
- [ ] Ignore this list and immediately add Coq, Lean, Haskell, and OCaml as languages no matter what since you need them for proof solving
- [ ] Then add the other 50 languages
- [ ] Locate the mailing list(s) for that programming language, scrape its archives
Happy to work on this if no one else is currently working on it. Can probably re-use a lot of the UseNet code here.