Yejing Lai
Yejing Lai
Hi RezaYazdaniAminabadi, I run the generate text task in the Megatron-Deepspeed example: https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples/generate_text.sh I attempted to use AutoTP in this task, and I set replace_with_kernel_inject=False in this line: https://github.com/microsoft/Megatron-DeepSpeed/blob/main/tools/generate_samples_gpt.py#L160 But...
Hi @RezaYazdaniAminabadi @delock. Could you please help review this PR? Thanks~
> Hi @Yejing-Lai can you give some explaination on the need to have grainularity of 64 elements? https://github.com/microsoft/DeepSpeed/pull/4697/files#diff-214e32993d5440123080193836e988f024771aa4f6931c614ef9ad42a493f398R31 DNN library favors tensor size in granularity of power of 2, we...
> @Yejing-Lai, please help resolve conflict. Hi @tjruwase. I resolved the conflict. Can you approve the workflows? Thanks~
Hi @tjruwase. The conflict had been resolved. Could you please help review this PR? Thanks~
> Hi @delock - FYI could you resolve the merge conflicts on this PR so it can be reviewed/tests run? Hi @loadams. The conflicts have been resolved. Please review~
Hi @oelayan7. Can you share your allocation failure log or screenshot? This change logic is for balance sharding. In attention uneven sharding, we need to depend on the num_kv_heads. However,...
Could you give me an example? For the attention layer, we need to rely on kv_heads to split qkv weight, during uneven sharding, there will always be more blocks in...
Which uneven sharding method do you expect? Can you give an example? Or can you submit a PR to add another correct path to support the uneven sharding~ Maybe we...
Hi, We can add an option to configure the block size, default config block size is 1. It will change the MLP sharding way. For example llama-7b MLP first layer...