LLM-Training-Puzzles
LLM-Training-Puzzles copied to clipboard
What would you do with 1000 H100s...
Results
3
LLM-Training-Puzzles issues
Sort by
recently updated
recently updated
newest added
Fixes #6.
For some of the puzzles, the target memory (and time) are impossible to satisfy. E.g. for puzzle 0, I spent some time trying to get from `2621440` to under the...
Hi, this might be a minor thing, but I'm wondering in distributed data parallel, when we aggregate `grad_weights` from all machines using `model.allgather`, since `allgather` performs `sum` operation, shouldn't we...