nlp-datasets
nlp-datasets copied to clipboard
Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)
Geolocated links for twitter UK and USA broken
Responsible disclosure: datasets compiled by us.
Hey, great collection of resources! We would like to add our open source dataset for German Question Answering and IR.
Add a record for S2ORC dataset (The Semantic Scholar Open Research Corpus)
Please consider whether this resource would be good for your list. It a large collection of data about entities such as people, businesses, and organizations. It also includes code to...
Corrected typo and added dataset
The old link does not work anymore. So I replaced the old link with the link of the original source of the dataset (CMU)
Added Tiny QA Benchmark++ (TQB++) Paper: https://arxiv.org/abs/2505.12058