GLM How to pre-train? has anyone start pre-train successful?

the repo contains pretrain_glm.py and scripts/ds_pretrain_nvidia.sh, and it got an args $1, so what is this $1？ is it the config_tasks/seq_blank.sh?
I want to know if the model arch is GPT-2 for all tasks? just use the mask matrix to cover bi-di and uni-di attention? and if so, how can the model know the cloze's orders? for example, if gpt-2 and two clozes like "A [MASK1] B [MASK2] C" and model predict "[s] x1 x2 [\s] [s] x3[\s]" , with filling back , how to know its [A][x1][x2]B[x3][C], not [A][x3]B[x1][x2][C]?

thanks a lot if someone answers.

Jun 24 '21 03:06 wdyxwzyh

I guess by adding position 1 embedding we can indicate which [mask] we are generating now. You can relook Figure2 . (c) . position 1 for a more intuitive explanation

May 09 '22 23:05 zhufq00

I success run the pretraining code, just look my new closed issue, maybe it will help you

May 13 '22 01:05 zhufq00

I success run the pretraining code, just look my new closed issue, maybe it will help you

hello, which issue, can you link it? I encouter "assert cdb is not None and cdb.is_initialized()" when i run ds_pretrain_nvidia.sh

Mar 02 '23 07:03 lulia0228

I success run the pretraining code, just look my new closed issue, maybe it will help you

hello, which issue, can you link it? I encouter "assert cdb is not None and cdb.is_initialized()" when i run ds_pretrain_nvidia.sh

The error is caused by the latest version of DeepSpeed, which forces the code to call deepspeed.init_distributed. We just fixed the problem by adding the call in initialized_distributed. Can you try ds_pretrain_nvidia.sh again?

Mar 04 '23 12:03 duzx16