DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

BERT pre-training - Data preparation - Sharding speed-up

Open ThomasPerrais opened this issue 4 years ago • 0 comments

Changed the way we keep track of sentences counts in each shard training and test file to avoid re-calculating from scratch. This results in substantial speed-up of sharding on huge text files especially.

ThomasPerrais avatar Jan 19 '22 14:01 ThomasPerrais