datasets
datasets copied to clipboard
Some of DownloadConfig's properties are always being overridden in load.py
Describe the bug
The extract_compressed_file and force_extract properties of DownloadConfig are always being set to True in the function dataset_module_factory in the load.py file. This behavior is very annoying because data extracted will just be ignored the next time the dataset is loaded.
See this image below:
Steps to reproduce the bug
- Have a local dataset that contains archived files (zip, tar.gz, etc)
- Build a dataset loading script to download and extract these files
- Run the load_dataset function with a DownloadConfig that specifically set
force_extractto False - The extraction process will start no matter if the archives was extracted previously
Expected behavior
The extraction process should not run when the archives were previously extracted and force_extract is set to False.
Environment info
datasets==2.20.0 python3.9