Load dataset from hf failed
datasets = ['hotpotqa', '2wikimqa', 'musique', 'narrativeqa', 'qasper', 'multifieldqa_en', 'gov_report', 'qmsum', 'trec', 'samsum', 'triviaqa', 'passage_count', 'passage_retrieval_en', 'multi_news']
for dataset in datasets:
print(f"Loading dataset {dataset}")
data = load_dataset("THUDM/LongBench", dataset, split="test")
output_path = f"{output_dir}/pred/{dataset}.jsonl"
File "/usr/local/lib/python3.9/dist-packages/datasets/packaged_modules/cache/cache.py", line 65, in _find_hash_in_cache raise ValueError( ValueError: Couldn't find cache for THUDM/LongBench for config '2wikimqa' Available configs in the cache: ['dureader', 'hotpotqa', 'multifieldqa_en_e', 'qasper_e']
Hi, can you try deleting the cached files and download all over again?
Hi, can you try deleting the cached files and download all over again?
yes, and I test many times in both local machine and docker environment. I don't known if you can reproduce this error, maybe this error is just my mistakes. Thanks for your reply.
Finally I was forced to download the jsonl file and load it from local disk and it works.
I can still use this dataset but I think this error may leading to reduced usage.
Glad to hear you've loaded the dataset! Perhaps this error is due to a low datasets version. One can try update the package:
pip install -U datasets
Glad to hear you've loaded the dataset! Perhaps this error is due to a low
datasetsversion. One can try update the package:pip install -U datasets
I have already upgraded it to the lastest version but it didn't work. Maybe it's the huggingface issue?
Hi there, downgrading datasets to 3.2.0 works for me.
When using datasets==4.3.0, the following log shows up, and I can't load datasets properly. It's probably because huggingface no longer support remote script execution for dataset loading. Perhaps maintainers can consider updating the dataset to "a standard format like Parquet", as the log suggests?