Siddharth Singh

Results 25 issues of Siddharth Singh

WIP PR for pipeline parallelism Has convergence issues

This PR enables token dropping for full tensor parallelism. Also corrects timers. (Still WIP)

https://github.com/jettify/pytorch-optimizer/blob/910b414565427f0a66e20040475e7e4385e066a5/torch_optimizer/shampoo.py#L130 Shouldn't the second argument be `-0.5/order`? For example, with order 2, the authors raise the precondition matrices to the -1/4th power.

Steps to run - Install AxoNN (dependencies - Pytorch and mpi4py) - - git clone [email protected]:axonn-ai/axonn.git - cd axonn - git checkout 45647ea - pip install -e . Preparing a...

feature request

**Describe the bug** I am trying to launch multiple Megatron-DeepSpeed jobs on a slurm based cluster. For each job, I want to create a different hostfile called hostfile_${SLURM_JOBID}. However, when...

bug
training

https://github.com/microsoft/DeepSpeedExamples/blob/737c6740bec38b77a24a59135b6481a53d566b38/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/training_log_output/opt-1.3b-globalBatchSize128.log#L4 Why is the PPL here 4k when we are starting with a pretrained model?

bfloat16 is the go-to datatype for mixed precision training of large neural networks. This PR aims to add bf-16 support in axonn

ready-for-review

Why? 1. Reduce scatters - happen on weight gradients, and researchers increasingly want to do these in fp32. 2. All gathers - with torch.autocast, these were happening in fp32 by...

might be helpful for end users?