Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Z Loss in CE

Open Fr0do opened this issue 1 year ago • 2 comments

🚀 The feature, motivation and pitch

Often used in pretraining of LMs for stabilization, i.e. the recent Chameleon & PaLM.

Alternatives

flash-attn has implementations of abovementioned features, however, does not support fusing with linear head.

Additional context

No response

Fr0do avatar Sep 03 '24 16:09 Fr0do

Legit ask! We have tracked smooth label at https://github.com/linkedin/Liger-Kernel/issues/81. I modify the title for only Z loss to prevent duplication.

ByronHsu avatar Sep 03 '24 20:09 ByronHsu

@ByronHsu #take To support z loss, I just need a little add-ons to #198. I'll work on it after merging label_smoothing PR.

Tcc0403 avatar Sep 06 '24 11:09 Tcc0403