Sumanth R Hegde

Results 3 issues of Sumanth R Hegde

Hi @jalammar , looks like I missed a commit on #110. All updates are on 68d893150d4fe81fcb34ebb39bf154d7aa437d6d . Changes since last time: - change import of `top_k_top_p_filtering` - Use raw string...

## Why are these changes needed? This is massive PR to fix two outdated documentation links. ## Related issue number ## Checks - [x] I've signed off every commit(by using...

P2
docs
data
go

**Describe the bug** For ZeRO-3, i'm noticing an increase in training times on g5.48xlarge nodes with torch >= 2.3.1 and CUDA 12.1. I can reproduce this with small and large...

bug
training