DeepSpeed Request for Mixtral 8X7B inference with DP+EP+TP

I want to use the mixtral 8X7B model for inference, but currently it only supports autoTP. How to add more support to enable it to use more parallelism (e.g. EP, DP)

Mar 28 '24 07:03 haoranlll

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model（or other MoE model）. Have you managed to find a solution to this problem?

Jul 29 '24 09:07 leachee99

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model（or other MoE model）. Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

Aug 08 '24 06:08 haoranlll

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model（or other MoE model）. Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

Aug 08 '24 08:08 leachee99

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model（or other MoE model）. Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

All the scripts can be found in Megatron-DeepSpeed/examples_deepspeed. The scripts for training are in MoE/*.sh, e.g., ds_pretrain_gpt_1.3B_MoE128.sh. The inference script is generate_text.sh. But for inference, there are many bugs in the relevant codes. Usually, you can fix these bugs by delete the unnecessary params.

Aug 09 '24 02:08 haoranlll

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model（or other MoE model）. Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

All the scripts can be found in Megatron-DeepSpeed/examples_deepspeed. The scripts for training are in MoE/*.sh, e.g., ds_pretrain_gpt_1.3B_MoE128.sh. The inference script is generate_text.sh. But for inference, there are many bugs in the relevant codes. Usually, you can fix these bugs by delete the unnecessary params.

Thank you so much for your response. I followed your approach for training and inference, but I encountered an issue where the trained weights seem to be inconsistent with what is required by the generate_text.sh script. I’m currently facing an issue with the generate_text.sh script not being able to find the XX/iter_0000200/mp_rank_00/model_optim_rng.pt file. The weights generated from the ds_pretrain_gpt_125M_MoE64.sh script are saved as XX/global_step200/*, as shown in the image below. I suspect that the weights might need to be converted from Megatron-DeepSpeed format to Megatron format. I attempted to use the tools/convert_checkpoint/deepspeed_to_megatron.py for this conversion, but it seems that this file is not compatible with MoE models. Have you encountered a similar issue? If so, could you please share how you managed to resolve it? Thank you very much for your assistance.

Aug 14 '24 09:08 leachee99

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model（or other MoE model）. Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

All the scripts can be found in Megatron-DeepSpeed/examples_deepspeed. The scripts for training are in MoE/*.sh, e.g., ds_pretrain_gpt_1.3B_MoE128.sh. The inference script is generate_text.sh. But for inference, there are many bugs in the relevant codes. Usually, you can fix these bugs by delete the unnecessary params.

Thank you so much for your response. I followed your approach for training and inference, but I encountered an issue where the trained weights seem to be inconsistent with what is required by the generate_text.sh script. I’m currently facing an issue with the generate_text.sh script not being able to find the XX/iter_0000200/mp_rank_00/model_optim_rng.pt file. The weights generated from the ds_pretrain_gpt_125M_MoE64.sh script are saved as XX/global_step200/*, as shown in the image below. I suspect that the weights might need to be converted from Megatron-DeepSpeed format to Megatron format. I attempted to use the tools/convert_checkpoint/deepspeed_to_megatron.py for this conversion, but it seems that this file is not compatible with MoE models. Have you encountered a similar issue? If so, could you please share how you managed to resolve it? Thank you very much for your assistance.

Did you have solved the issue? For inference, the model_optim_rng.pt file is not required. Maybe you can try the function load_model_weights_only in the "megatron/training.py". Moreover, it's better to load the model weights and resharding them into the target by yourself, the original scripts can inference using the same parallelism with training.

Sep 01 '24 07:09 haoranlll

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model（or other MoE model）. Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

All the scripts can be found in Megatron-DeepSpeed/examples_deepspeed. The scripts for training are in MoE/*.sh, e.g., ds_pretrain_gpt_1.3B_MoE128.sh. The inference script is generate_text.sh. But for inference, there are many bugs in the relevant codes. Usually, you can fix these bugs by delete the unnecessary params.

Thank you so much for your response. I followed your approach for training and inference, but I encountered an issue where the trained weights seem to be inconsistent with what is required by the generate_text.sh script. I’m currently facing an issue with the generate_text.sh script not being able to find the XX/iter_0000200/mp_rank_00/model_optim_rng.pt file. The weights generated from the ds_pretrain_gpt_125M_MoE64.sh script are saved as XX/global_step200/*, as shown in the image below. I suspect that the weights might need to be converted from Megatron-DeepSpeed format to Megatron format. I attempted to use the tools/convert_checkpoint/deepspeed_to_megatron.py for this conversion, but it seems that this file is not compatible with MoE models. Have you encountered a similar issue? If so, could you please share how you managed to resolve it? Thank you very much for your assistance.

Did you have solved the issue? For inference, the model_optim_rng.pt file is not required. Maybe you can try the function load_model_weights_only in the "megatron/training.py". Moreover, it's better to load the model weights and resharding them into the target by yourself, the original scripts can inference using the same parallelism with training.

Hi, I’m sorry for the late reply. I haven’t solved the issue, but I was able to achieve the desired outcome using another method. I will definitely try out the approach you mentioned as well. Thank you so much for your help!

Oct 21 '24 03:10 leachee99

That's great.

Oct 21 '24 07:10 haoranlll