Tcc0403 comments

Results 63 comments of


                                            Tcc0403

[feat] FusedLinearCrossEntropy support for Gemma2

I believe I've implemented softcap in cross entropy function correctly and the flce support for gemma2. But since gemma2 currently can't pass the test even without flce, do I need...

Ensure In-place correctness checks work properly

[bench](https://gist.github.com/Tcc0403/d1e64df7f604a9ed3878016caae64cf2) i did some benchmarks on H100, adding any torch's inplace op increases time cost by roughtly 50% (original -> with_hint = 23 -> 32 ms for 128k vocab size)....

Ensure In-place correctness checks work properly

@ByronHsu > I am wondering why the error does not happen for normal case? I left an [explanation](https://github.com/linkedin/Liger-Kernel/issues/272#issuecomment-2390882798) in issue

Ensure In-place correctness checks work properly

With @mgrabban's suggestion in #343, I made another implmentation with [mark_dirty()](https://pytorch.org/docs/stable/generated/torch.autograd.function.FunctionCtx.mark_dirty.html#torch-autograd-function-functionctx-mark-dirty). **Note**: I haven't benchmarked this new approach against current liger_ce, will do it in few days. draft [gist](https://gist.github.com/Tcc0403/73dfbbaab7cc6208598eebb0aa10f5cd) for...

Ensure In-place correctness checks work properly

Update: Here's the benchmark against liger's ce speed is slower by 17% (TODO: investigate where the overhead occurs) memory is double (i guess its because the tensors passed in `mark_dirty()`...

Ensure In-place correctness checks work properly

not required for now

Z Loss in CE

@ByronHsu #take To support z loss, I just need a little add-ons to #198. I'll work on it after merging label_smoothing PR.

CI failure tracker

@shimizust Thanks for the help. I only checked the vlm related issues, haven't checked what caused the `_init_weights` errors.

Support Z Loss in CE

Passed all tests. Ready for review!

Support Z Loss in CE

Ignore OOM errors, the current custom CrossEntropyWithZLoss (torch.nn.module), as a ground truth implementation, has precision issue on gradients calculations with bfloat16 and reduction="sum". LigerCrossEntropyLoss in this PR has no issue...