nlp-datasets icon indicating copy to clipboard operation
nlp-datasets copied to clipboard

Alphabetical list of free/public domain datasets with text data for use in Natural Language Processing (NLP)

Results 11 nlp-datasets issues
Sort by recently updated
recently updated
newest added

Geolocated links for twitter UK and USA broken

Responsible disclosure: datasets compiled by us.

Hey, great collection of resources! We would like to add our open source dataset for German Question Answering and IR.

Add a record for S2ORC dataset (The Semantic Scholar Open Research Corpus)

Please consider whether this resource would be good for your list. It a large collection of data about entities such as people, businesses, and organizations. It also includes code to...

The old link does not work anymore. So I replaced the old link with the link of the original source of the dataset (CMU)

Added Tiny QA Benchmark++ (TQB++) Paper: https://arxiv.org/abs/2505.12058