Results 6 comments of Charlie

> @Arcmoon-Hu good to know that. I am aware of the transformer issue and will fix it ASAP Is it fixed?

> in my case, using liger kernel for Qwen 2.5 VL with sequence parallelism, the grad norm quickly goes to nan Hi ! What framework supports liger kernel and sequence...

> 你解决了吗?我也非常困惑... 这个step数,我调整accumulative_counts 对于总step数没有什么影响... 非常奇怪 +1

同样的问题,accumulative_counts 似乎是不生效的,实际 step 数是 accumulative_counts=1 的情况