DeepSpeed
DeepSpeed copied to clipboard
Add chatglm2 & chatglm3 autotp
This PR aims to enable chatglm2 & chatglm3 autotp. Similar to the phi3, this model uses the chunk MLP layer, so we adjust the weight order by 'shard_mlp_chunk' func. Please kindly review~ Thanks!