Jiang Jiwen
Jiang Jiwen
### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/omnisafe/issues) and [Discussions](https://github.com/PKU-Alignment/omnisafe/discussions) that this hasn't already been reported. (+1 or comment...
We plan to finetune a model in MegatronLM, the model (11B) is sharded with tp=4, pp=16. We want to finetune the model in fp32 rather than fp16 or bf16. The...
I met an issue and want to split the embedding layer out of transformer block to make it alone in single pp stage, but I found that it has not...
Hi, I am wondering hf adapter support transformer version > 4.47.0, becasue the signature of _flash_attention_forward has changed, the length is different in https://github.com/zhuzilin/ring-flash-attention/blob/be3b01f5706f45245f9b6d78d6df231954b2ee64/ring_flash_attn/adapters/hf_adapter.py#L23