Charlie
Charlie
> @Arcmoon-Hu good to know that. I am aware of the transformer issue and will fix it ASAP Is it fixed?
> in my case, using liger kernel for Qwen 2.5 VL with sequence parallelism, the grad norm quickly goes to nan Hi ! What framework supports liger kernel and sequence...
> 你解决了吗?我也非常困惑... 这个step数,我调整accumulative_counts 对于总step数没有什么影响... 非常奇怪 +1
同样的问题,accumulative_counts 似乎是不生效的,实际 step 数是 accumulative_counts=1 的情况