Xirid
Results
1
issues of
Xirid
**Describe the bug** Training a hf model (llama 3.1 with peft) on long context with sequence_parallel_size > 1 works only up until zero stage 2. If I set "stage" to...
bug
training