lightning-thunder
lightning-thunder copied to clipboard
Allocate dQ, dK, and dV as a catted tensor to save a downstream cat in nvFuser.
The description on the added compile option explains what this optimization does.
This optimization is disabled by default for now. I'll try to enable it by default or even always after #35 is merged and bookend is disabled by default.