DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Request for Mixtral 8X7B inference with DP+EP+TP

Open haoranlll opened this issue 1 year ago • 6 comments

I want to use the mixtral 8X7B model for inference, but currently it only supports autoTP. How to add more support to enable it to use more parallelism (e.g. EP, DP)

haoranlll avatar Mar 28 '24 07:03 haoranlll

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model(or other MoE model). Have you managed to find a solution to this problem?

leachee99 avatar Jul 29 '24 09:07 leachee99

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model(or other MoE model). Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

haoranlll avatar Aug 08 '24 06:08 haoranlll

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model(or other MoE model). Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

leachee99 avatar Aug 08 '24 08:08 leachee99

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model(or other MoE model). Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

All the scripts can be found in Megatron-DeepSpeed/examples_deepspeed. The scripts for training are in MoE/*.sh, e.g., ds_pretrain_gpt_1.3B_MoE128.sh. The inference script is generate_text.sh. But for inference, there are many bugs in the relevant codes. Usually, you can fix these bugs by delete the unnecessary params.

haoranlll avatar Aug 09 '24 02:08 haoranlll

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model(or other MoE model). Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

All the scripts can be found in Megatron-DeepSpeed/examples_deepspeed. The scripts for training are in MoE/*.sh, e.g., ds_pretrain_gpt_1.3B_MoE128.sh. The inference script is generate_text.sh. But for inference, there are many bugs in the relevant codes. Usually, you can fix these bugs by delete the unnecessary params.

Thank you so much for your response. I followed your approach for training and inference, but I encountered an issue where the trained weights seem to be inconsistent with what is required by the generate_text.sh script. I’m currently facing an issue with the generate_text.sh script not being able to find the XX/iter_0000200/mp_rank_00/model_optim_rng.pt file. The weights generated from the ds_pretrain_gpt_125M_MoE64.sh script are saved as XX/global_step200/*, as shown in the image below. image I suspect that the weights might need to be converted from Megatron-DeepSpeed format to Megatron format. I attempted to use the tools/convert_checkpoint/deepspeed_to_megatron.py for this conversion, but it seems that this file is not compatible with MoE models. Have you encountered a similar issue? If so, could you please share how you managed to resolve it? Thank you very much for your assistance.

leachee99 avatar Aug 14 '24 09:08 leachee99

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model(or other MoE model). Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

All the scripts can be found in Megatron-DeepSpeed/examples_deepspeed. The scripts for training are in MoE/*.sh, e.g., ds_pretrain_gpt_1.3B_MoE128.sh. The inference script is generate_text.sh. But for inference, there are many bugs in the relevant codes. Usually, you can fix these bugs by delete the unnecessary params.

Thank you so much for your response. I followed your approach for training and inference, but I encountered an issue where the trained weights seem to be inconsistent with what is required by the generate_text.sh script. I’m currently facing an issue with the generate_text.sh script not being able to find the XX/iter_0000200/mp_rank_00/model_optim_rng.pt file. The weights generated from the ds_pretrain_gpt_125M_MoE64.sh script are saved as XX/global_step200/*, as shown in the image below. image I suspect that the weights might need to be converted from Megatron-DeepSpeed format to Megatron format. I attempted to use the tools/convert_checkpoint/deepspeed_to_megatron.py for this conversion, but it seems that this file is not compatible with MoE models. Have you encountered a similar issue? If so, could you please share how you managed to resolve it? Thank you very much for your assistance.

Did you have solved the issue? For inference, the model_optim_rng.pt file is not required. Maybe you can try the function load_model_weights_only in the "megatron/training.py". Moreover, it's better to load the model weights and resharding them into the target by yourself, the original scripts can inference using the same parallelism with training.

haoranlll avatar Sep 01 '24 07:09 haoranlll

I am experiencing the same issue with enabling TP+EP for inference on the Mixtral model(or other MoE model). Have you managed to find a solution to this problem?

I only inference with TP+EP using GPT-MoE64, not for Mixtral. And the script has to utilize the Megatron-Deepspeed.

I'm facing some difficulties with getting EP to work correctly. Could you share how you managed to implement TP + EP on GPT-MoE64? Any relevant scripts or code, suggestions would be greatly appreciated. If it's convenient, we can also discuss this further via email. My email address is [email protected]

All the scripts can be found in Megatron-DeepSpeed/examples_deepspeed. The scripts for training are in MoE/*.sh, e.g., ds_pretrain_gpt_1.3B_MoE128.sh. The inference script is generate_text.sh. But for inference, there are many bugs in the relevant codes. Usually, you can fix these bugs by delete the unnecessary params.

Thank you so much for your response. I followed your approach for training and inference, but I encountered an issue where the trained weights seem to be inconsistent with what is required by the generate_text.sh script. I’m currently facing an issue with the generate_text.sh script not being able to find the XX/iter_0000200/mp_rank_00/model_optim_rng.pt file. The weights generated from the ds_pretrain_gpt_125M_MoE64.sh script are saved as XX/global_step200/*, as shown in the image below. image I suspect that the weights might need to be converted from Megatron-DeepSpeed format to Megatron format. I attempted to use the tools/convert_checkpoint/deepspeed_to_megatron.py for this conversion, but it seems that this file is not compatible with MoE models. Have you encountered a similar issue? If so, could you please share how you managed to resolve it? Thank you very much for your assistance.

Did you have solved the issue? For inference, the model_optim_rng.pt file is not required. Maybe you can try the function load_model_weights_only in the "megatron/training.py". Moreover, it's better to load the model weights and resharding them into the target by yourself, the original scripts can inference using the same parallelism with training.

Hi, I’m sorry for the late reply. I haven’t solved the issue, but I was able to achieve the desired outcome using another method. I will definitely try out the approach you mentioned as well. Thank you so much for your help!

leachee99 avatar Oct 21 '24 03:10 leachee99

That's great.

haoranlll avatar Oct 21 '24 07:10 haoranlll