Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

jamba liger fused linear+xentropy

Open winglian opened this issue 1 year ago • 6 comments

Summary

Testing Done

  • Hardware Type: <BLANK>
  • [ ] run make test to ensure correctness
  • [ ] run make checkstyle to ensure code style
  • [ ] run make test-convergence to ensure convergence

winglian avatar Aug 26 '24 17:08 winglian

awesome! please make sure you add both conv (w logits and w/o logits) and unit tests. we are very focused on testing

ByronHsu avatar Aug 26 '24 17:08 ByronHsu

https://github.com/linkedin/Liger-Kernel/issues/63

ByronHsu avatar Aug 26 '24 17:08 ByronHsu

I added the following additional monkey patch for Jamba.

    from transformers.models.jamba import modeling_jamba
    if rms_norm:
        # https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/models/gemma/modeling_gemma.py#L109
        modeling_jamba.JambaRMSNorm = LigerRMSNorm
    if cross_entropy:
        modeling_jamba.CrossEntropyLoss = LigerCrossEntropyLoss
    if swiglu:
        modeling_jamba.JambaMLP = LigerSwiGLUMLP

However, convergence test seems to be failing for some values in the tensor:

E           Mismatch at index (0, 5): tensor1[(0, 5)] = 1.1513792276382446, tensor2[(0, 5)] = 1.1512681245803833
E           Mismatch at index (0, 27): tensor1[(0, 27)] = 0.6227690577507019, tensor2[(0, 27)] = 0.6227344870567322
E           Mismatch at index (0, 28): tensor1[(0, 28)] = 0.7790964841842651, tensor2[(0, 28)] = 0.7790292501449585
E           Mismatch at index (0, 29): tensor1[(0, 29)] = 0.524261474609375, tensor2[(0, 29)] = 0.5243569612503052
E           Mismatch at index (0, 30): tensor1[(0, 30)] = 0.8967938423156738, tensor2[(0, 30)] = 0.8968125581741333

I tracked this down to LigerRMSNorm but needs more time to investigate why there is a difference

yubofredwang avatar Sep 04 '24 18:09 yubofredwang

HI @winglian created a PR towards main branch of your fork. Do you want to merge it first and then update this PR to base on that? https://github.com/winglian/Liger-Kernel/pull/1

Or I can create a separate PR to linkedin:main https://github.com/linkedin/Liger-Kernel/pull/214

@ByronHsu thoughts?

yubofredwang avatar Sep 05 '24 07:09 yubofredwang

@yubofredwang if your PR captures all the changes, I'm happy to have your PR supersede mine. thanks!

winglian avatar Sep 05 '24 13:09 winglian

@yubofredwang there are few conflicts

ByronHsu avatar Sep 06 '24 21:09 ByronHsu

No activities for a long time. Closing this PR. Feel free to create a new PR if there are progress. Thanks!

helloworld1 avatar Mar 04 '25 19:03 helloworld1