Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[ENHANCEMENT]Is Megatron planning to use flux technology?Integrating communication and gemm into one operator to improve overlap rate

Open taozhiwei opened this issue 1 year ago • 1 comments

https://arxiv.org/abs/2406.06858v1

https://github.com/bytedance/flux

Is Megatron planning to use flux technology?Integrating communication and gemm into one operator to improve overlap rate.

taozhiwei avatar Sep 13 '24 06:09 taozhiwei

We are looking into an approach to fuse the GEMM and its dependent communication into a single kernel. The support for such an optimization will take some time to ensure reliability.

erhoo82 avatar Sep 20 '24 00:09 erhoo82

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Nov 19 '24 18:11 github-actions[bot]

@erhoo82 Hi, is there any progress on this? Flux is really hard to train and we are really looking forward to the support of Megatron...

tingxueronghua avatar Jan 16 '25 12:01 tingxueronghua

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Mar 17 '25 18:03 github-actions[bot]

@erhoo82 Hi, is there any progress on this? Flux is really hard to train and we are really looking forward to the support of Megatron...

Are you working on this? I'm also interested.

huhuiqi7 avatar Apr 15 '25 08:04 huhuiqi7

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Jun 14 '25 18:06 github-actions[bot]

Sharing updates.

Currently, Megatron-LM supports the overlap of Tensor-parallel communications with computations using the split GEMM and communication kernels from Transformer Engine. Transformer Engine plans to migrate the overlap implementation from a custom in-package build (userbuffer) to cublasMp backend that uses NVSHMEM. This new backend will still use split kernels for GEMM and communication.

We are still discussing a single-kernel implementation, but no detailed plan has been established yet.

erhoo82 avatar Jul 15 '25 17:07 erhoo82