How to pre-train? has anyone start pre-train successful?
- the repo contains pretrain_glm.py and scripts/ds_pretrain_nvidia.sh, and it got an args $1, so what is this $1? is it the config_tasks/seq_blank.sh?
- I want to know if the model arch is GPT-2 for all tasks? just use the mask matrix to cover bi-di and uni-di attention? and if so, how can the model know the cloze's orders? for example, if gpt-2 and two clozes like "A [MASK1] B [MASK2] C" and model predict "[s] x1 x2 [\s] [s] x3[\s]" , with filling back , how to know its [A][x1][x2]B[x3][C], not [A][x3]B[x1][x2][C]?
thanks a lot if someone answers.
I guess by adding position 1 embedding we can indicate which [mask] we are generating now. You can relook Figure2 . (c) . position 1 for a more intuitive explanation
I success run the pretraining code, just look my new closed issue, maybe it will help you
I success run the pretraining code, just look my new closed issue, maybe it will help you
hello, which issue, can you link it? I encouter "assert cdb is not None and cdb.is_initialized()" when i run ds_pretrain_nvidia.sh
I success run the pretraining code, just look my new closed issue, maybe it will help you
hello, which issue, can you link it? I encouter "assert cdb is not None and cdb.is_initialized()" when i run ds_pretrain_nvidia.sh
The error is caused by the latest version of DeepSpeed, which forces the code to call deepspeed.init_distributed. We just fixed the problem by adding the call in initialized_distributed. Can you try ds_pretrain_nvidia.sh again?