Medusa
Medusa copied to clipboard
Fix sharing of resblock layers (from Liger-Kernel#269)
When using multiple residual block in medusa MLP heads, parameters are wrongly shared.
This was already reported in Hydra and already fixed in the Liger-Kernel repository https://github.com/zankner/Hydra/issues/8 https://github.com/linkedin/Liger-Kernel/pull/269