BERT pre-training - Data preparation - Sharding speed-up

Open ThomasPerrais opened this issue 4 years ago • 0 comments

Changed the way we keep track of sentences counts in each shard training and test file to avoid re-calculating from scratch. This results in substantial speed-up of sharding on huge text files especially.

Jan 19 '22 14:01 ThomasPerrais