Jiang Jiwen
Jiang Jiwen
> ## 1. Convert llama-2 from HuggingFace to Megatron-LM: > ``` > PYTHONPATH=$(pwd) tools/checkpoint/util.py --model-type=GPT --loader=llama2_hf --load-dir= --save-dir= --tokenizer-model= > ``` > > ## 2. Convert llama-2 from Megatron-LM to...
I try to use the flag `--use-distributed-optimzier`, and it does not report the error. So is `--use-distributed-optimzier` a must for multinode training in fp32?
你好,请问下down_proj, o_proj初始化为0,o_proj,down_proj有梯度吗。
I do not now if I comment this function will result in any issue of training.