Megatron-DeepSpeed icon indicating copy to clipboard operation
Megatron-DeepSpeed copied to clipboard

Add valid data

Open sbmaruf opened this issue 4 years ago • 2 comments

As requested by @TevenLeScao

We want to:
1. train on a mix of languages
2. do validation on English-only

By default, Megatron-deepspeed uses just a fraction of the training set as the validation set, so we can't have multilingual training data and English-only validation data at the moment. In order to launch experiments, we'd need just a dirty hack to be able to use an English-only validation set

  • [x] Add additional argument for valid data
  • [x] Implement valid data-loader
  • [x] Run a dummy test

sbmaruf avatar Sep 22 '21 01:09 sbmaruf

@TevenLeScao Did you get a chance to take a look into this pull?

sbmaruf avatar Sep 29 '21 06:09 sbmaruf

Hey Maruf, sorry, not yet, I'm a bit swamped at the moment and the priority switched to cleaning OSCAR-ml additionally ourselves before launching anything on it, maybe @ibeltagy can review in the meantime?

TevenLeScao avatar Sep 30 '21 07:09 TevenLeScao