Megatron-LM [ENHANCEMENT]Is Megatron planning to use flux technology？Integrating communication and gemm into one operator to improve overlap rate

Is Megatron planning to use flux technology？Integrating communication and gemm into one operator to improve overlap rate.

Sep 13 '24 06:09 taozhiwei

We are looking into an approach to fuse the GEMM and its dependent communication into a single kernel. The support for such an optimization will take some time to ensure reliability.

Sep 20 '24 00:09 erhoo82

Marking as stale. No activity in 60 days.

Nov 19 '24 18:11 github-actions[bot]

@erhoo82 Hi, is there any progress on this? Flux is really hard to train and we are really looking forward to the support of Megatron...

Jan 16 '25 12:01 tingxueronghua

Marking as stale. No activity in 60 days.

Mar 17 '25 18:03 github-actions[bot]

@erhoo82 Hi, is there any progress on this? Flux is really hard to train and we are really looking forward to the support of Megatron...

Are you working on this? I'm also interested.

Apr 15 '25 08:04 huhuiqi7

Marking as stale. No activity in 60 days.

Jun 14 '25 18:06 github-actions[bot]

Sharing updates.

Currently, Megatron-LM supports the overlap of Tensor-parallel communications with computations using the split GEMM and communication kernels from Transformer Engine. Transformer Engine plans to migrate the overlap implementation from a custom in-package build (userbuffer) to cublasMp backend that uses NVSHMEM. This new backend will still use split kernels for GEMM and communication.

We are still discussing a single-kernel implementation, but no detailed plan has been established yet.

Jul 15 '25 17:07 erhoo82