Yu-Hsiang Wang

Results 6 comments of Yu-Hsiang Wang

I've added a paper-form option to the current Liger Kernel RoPE implementation.

#take I made a [PR](https://github.com/linkedin/Liger-Kernel/pull/465), please take a look, thanks @ByronHsu

I would like to work on this issue.

@PKUWZP I'll add the benchmark results as soon as the swiglu implementation is complete.

@shimizust During the convergence test, the loss values for the two models running in bf16 diverged significantly at certain steps. This is likely related to the issue discussed here: https://github.com/linkedin/Liger-Kernel/issues/742.