DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

[Bert/Pytorch] Difference between data_download.sh and create_dataset_from scratch.sh

Open wormyu opened this issue 2 years ago • 0 comments

Related to Bert/Pytorch

Describe the bug This is not a bug but a question. I'm wondering what's the difference between data_download.sh and create_dataset_from scratch.sh? In README.md the suggested way to download and preprocess data is using create_dataset_from scratch.sh, and doesn't mention the usage of data_donwload.sh.

In my understanding, in spite of downloading Wikipedia, data_donwload.sh will also download BookCorpus for pre-training usage. So what's the reason for not using data_download.sh to prepare data for pre-training.

wormyu avatar Jul 08 '23 12:07 wormyu