Protonu
Protonu
This commit is to start support for strided tensors. I made changes to percolate a vector in TensorInfo down to emitCudaKernel to allow codegen to cast strided tensors. This required...
This PR adds custom decompositions for Cross-Entropy Loss for the nvFuser executor. Adding these custom decompositions improves performance and allows further optimization in nvFuser. For cross-entropy loss forward: 1. We...
*Note*: If you have a model or program that is not supported yet but should be, please use the program coverage template. ## 🐛 Bug With the torch executor, Thunder...