Tcc0403

Results 19 issues of Tcc0403

## Summary Fix #272 It's a show case of how to trigger error properly. I only apply it to cross_entropy for demonstration, can apply to others if we want. ##...

## Summary Resolve #277. ## Testing Done - Hardware Type: gpu-ci - [x] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [x]...

### πŸ› Describe the bug #254 [#262 (comments)](https://github.com/linkedin/Liger-Kernel/pull/262#issuecomment-2374260041) PyTorch’s autograd system records operations on tensors to construct a computational graph, which is used for computing gradients. When an in-place operation...

bug

## Summary This PR aims to resolve #197 Implemented z loss in LigerCrossEntropy. note: `lse_square_scale` not exposed at flce yet, having issues passing the tests. ## Details ### For loss:...

reviewing

### πŸ› Describe the bug Most failures are related to transformers VLM changes ## unit test qwen2vl_mrope - [x] test_qwen2vl_mrope https://github.com/linkedin/Liger-Kernel/pull/728 monkey patch - [x] test_monkey_patch::test_apply_liger_kernel_to_instance_for_mllama_for_conditional_generation https://github.com/linkedin/Liger-Kernel/pull/737 - [x] test_monkey_patch::test_apply_liger_kernel_to_instance_for_gemma3...

### πŸ› Describe the bug CI details: [Qwen2VLConfig](https://github.com/linkedin/Liger-Kernel/actions/runs/15175089532/job/42680679210?pr=689#step:6:6549), [monkey patch impl related](https://github.com/linkedin/Liger-Kernel/actions/runs/15214784023/job/42797584570#step:5:1903) Text config is seperated out from the general config in transformers>=4.52.0. [Qwen2VLRotaryEmbedding](https://github.com/huggingface/transformers/pull/37268/files#diff-09bc594f9680f1d042fd485106c68022d77b59831697a00b3b38f12a3e40f395L103-R104) takes `Qwen2VLTextConfig` intead of `Qwen2VLConfig` now....

bug
huggingface

### πŸš€ The feature, motivation and pitch Upcoming refactor in transformers VLM models: https://github.com/huggingface/transformers/pull/37033 `XXXForConditionalGeneration` no longer has `language_model` attribute for `ForCausalLM`. It will be changed to `model` attribute to...

## Summary Rerun all benchmarks scripts to get the latest data, so we can have a reliable baseline for future optimization. Note: orpo failing with `compile=True` (plotting with old data...

### πŸ› Describe the bug Many discussions show that the current revert functions have several limitations, including: - incomplete revert: https://github.com/linkedin/Liger-Kernel/pull/627#issuecomment-2757281103 #542 - not automatically updating old reference: #385 -...

### πŸ› Describe the bug #369 found that CrossEntropyLoss wasn't applied in post-grad-acc-fix versions of transformers. Despite the fact that #375 fixed the issue, it didn't consider the revert functions...

bug