mollon650
mollon650
what‘s the difference of pipeline Parallelism between deepspeed and megatron?
support iree-turbine can support training ,i modify some code The modified code is distributed in iree/torch-mlir/iree-turbine ref:https://github.com/iree-org/iree-turbine/pull/13 https://github.com/shark-infra/torch-mlir/pull/1
deeepspeed chat 支持pipline 并行吗? ```[tasklist] ### Tasks ```
@cuichenx just like TENorm add # Set flag for sequence parallelism (custom Megatron-LM integration) if getattr(self, "sequence_parallel", None) is not None: self.weight.sequence_parallel = self.sequence_parallel self.bias.sequence_parallel = self.sequence_parallel to WrappedTorchLayerNorm class...