BERT
BERT copied to clipboard
have you any plan to realeas pretraining code with horovod.
i try to modify origin bert code with horovod using multi-gpus .but cant get same result as one_gpu . if set train_set=1000,2gpus will fininsh at globel step at 500.and eval result worse a lot than one gpu
i only had released lm adaptation with finetuing which could accelerate convergence speed and get more robust model.