[QUESTION] Has standalone_embedding_stage been supported yet in core?

Open JiwenJ opened this issue 1 year ago • 1 comments

I met an issue and want to split the embedding layer out of transformer block to make it alone in single pp stage, but I found that it has not been supported in core. Am I right? https://github.com/NVIDIA/Megatron-LM/blob/e33c8f78a35765d5aa37475a144da60e8a2349d1/megatron/core/transformer/transformer_block.py#L164

Jun 26 '24 06:06 JiwenJ

Same question. Keep tuned. I also have another question: As the language model head has the same shape with embedding layer, does Megatron-LM support standalone the language model head as a single pp stage? It will save memory and improve training efficiency.

Jul 19 '24 08:07 zhaoyang-star