Medusa icon indicating copy to clipboard operation
Medusa copied to clipboard

[Retraining] Use Liger Kernel to avoid multi-head logits materialization and scale the context length by N times

Open ByronHsu opened this issue 1 year ago • 1 comments

https://github.com/linkedin/Liger-Kernel/tree/main/examples/medusa

With the implementation of FusedLinearCrossEntropy and other kernels in Liger-Kernel, we are able to effectively reduce the memory while increase the throughput. We are happy to collaborate and integrate with our kernels!

image image

ByronHsu avatar Aug 26 '24 05:08 ByronHsu

cc @ctlllll @leeyeehoo @zhyncs

ByronHsu avatar Aug 26 '24 05:08 ByronHsu