Emin Orhan
Emin Orhan
It also seems odd that the fix didn't seem to have any effect on `train_loss`. Allegedly, 35% (or perhaps even 60%) of the training data was not being used previously....
@ByronHsu Thanks a lot for the pointer. I really appreciate the help. So, when I manually change the `num_warps` arguments to 64 in [`fused_linear_cross_entropy.py`](https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/ops/fused_linear_cross_entropy.py), it seems to fix this particular...
Ha! I had set `num_warps=64`, but I think you're right that it should have been 16 instead (I mixed up `num_warps` with `warp_size`).