Jiang Jiwen comments

Repositories
Issues
Comments

Results 4 comments of


                                            Jiang Jiwen

Huggingface <-> Megatron-LM Compatibility

> ## 1. Convert llama-2 from HuggingFace to Megatron-LM: > ``` > PYTHONPATH=$(pwd) tools/checkpoint/util.py --model-type=GPT --loader=llama2_hf --load-dir= --save-dir= --tokenizer-model= > ``` > > ## 2. Convert llama-2 from Megatron-LM to...

[QUESTION] Is FP32 supported in MultiNode Training

I try to use the flag `--use-distributed-optimzier`, and it does not report the error. So is `--use-distributed-optimzier` a must for multinode training in fp32?

关于零初始化和扩展层的位置

你好，请问下down_proj， o_proj初始化为0，o_proj，down_proj有梯度吗。

Question about transformer support

I do not now if I comment this function will result in any issue of training.