Xirid

Results 1 issues of Xirid

**Describe the bug** Training a hf model (llama 3.1 with peft) on long context with sequence_parallel_size > 1 works only up until zero stage 2. If I set "stage" to...

bug
training