Yu-Hsiang Wang
Yu-Hsiang Wang
I've added a paper-form option to the current Liger Kernel RoPE implementation.
#take I made a [PR](https://github.com/linkedin/Liger-Kernel/pull/465), please take a look, thanks @ByronHsu
I would like to work on this issue.
@Tcc0403
@PKUWZP I'll add the benchmark results as soon as the swiglu implementation is complete.
@shimizust During the convergence test, the loss values for the two models running in bf16 diverged significantly at certain steps. This is likely related to the issue discussed here: https://github.com/linkedin/Liger-Kernel/issues/742.