Siddharth Singh comments

Results 8 comments of


                                            Siddharth Singh

Adding AxoNN's 3D tensor parallelism [WIP]

Correctness check on 125M.yml with `use_axonn_model_parallelism:true`, `column_model_parallel_size=1`, `row_model_parallel_size=1`, `depth_model_parallel_size=2`, `model_parallel_size=2` on 2 GPUs. Dataset - enwiki8 (the loss curve is smoothed over 100 iterations) ![image](https://github.com/EleutherAI/gpt-neox/assets/16764680/31241adc-38aa-48ed-8b9b-f69407d47b7b)

Adding AxoNN's 3D tensor parallelism [WIP]

@Quentin-Anthony I have updated the install instructions to install axonn from a fixed commit - 3ebc34c

Adding AxoNN's 3D tensor parallelism [WIP]

@Quentin-Anthony Pushed some communication optimizations and also updated the instructions to install axonn from a newer commit - 45647ea. To enable these optimizations, you just need to set `"optimize_axonn_communication: true"`...

[BUG] unable to use a hostfile with a name that is not "hostfile"

Hi @loadams, sorry I didn't have the bandwidth to investigate this issue further. I just chugged along with creating hostfiles named "hostfiles" and running one job at a time.

[REQUEST] what‘s the difference of pipeline Parallelism between deepspeed and megatron?

They are mostly identical. The megatron implementation is tightly wedded to Megatron-LM, so you cannot use it elsewhere easily. DS's implementation is modular, so you could parallelize other workloads outside...

Parallel Transformers

@jwendlan can you update this in the same format as the current develop?

Allow Passing Model Keyword Arguments Through the parallelize Interface

lgtm. Will run the CI and merge this. Thank you

feat(MoE): Refactor cuda_graph_scope

Thanks for the clarification. Is this behavior present for both the local and TE implementation or just for TE? Mcore inference solely uses the local implementation, hence my question.