DeepLearningExamples
DeepLearningExamples copied to clipboard
[Bert/Pytorch] Difference between data_download.sh and create_dataset_from scratch.sh
Related to Bert/Pytorch
Describe the bug
This is not a bug but a question. I'm wondering what's the difference between data_download.sh and create_dataset_from scratch.sh? In README.md the suggested way to download and preprocess data is using create_dataset_from scratch.sh, and doesn't mention the usage of data_donwload.sh.
In my understanding, in spite of downloading Wikipedia, data_donwload.sh will also download BookCorpus for pre-training usage. So what's the reason for not using data_download.sh to prepare data for pre-training.