jjsjann123 comments

Results 138 comments of


                                            jjsjann123

Limits constant chunk propagation for pw-node-only

This one should be good to go

Limits constant chunk propagation for pw-node-only

@pytorchbot merge

LayerNorm+CUDA+JIT

It's probably both TS and nvfuser here. Looks like even fallback path causes failures in layer here. I'll take a better look

Am I supposed to see the exception thrown within the try block? @DavidSlayback ``` root@a0befa8bea8a:/jiej/playground# python repro_82899.py Assert works! No preactivate works ``` Doesn't seem to give me any error....

LayerNorm+CUDA+JIT

Got it. Have the repro locally. Something went wrong in our fusion pass.. I'm working on it.

LayerNorm+CUDA+JIT

I think there's something funny with ConstantChunk here.... We are definitely abusing that and should not have push it for non-pw operations....

LayerNorm+CUDA+JIT

> @jjsjann123 so does the issue go away when we turn off nvfuser? It does if I run with `PYTORCH_JIT_USE_NNC_NOT_NVFUSER=1 python repro_82899.py --cuda True` Does NNC default not fusing normalizations?...

LayerNorm+CUDA+JIT

I was referring to this https://github.com/pytorch/pytorch/blob/aad3b8e4d3b00f9bd95bc52fd5242dd0f1c43557/torch/csrc/jit/passes/tensorexpr_fuser.cpp#L485-L503 nvfuser doesn't fuse ConstantChunk, instead we rely on this old optimization pass that we stole from legacy fuser: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/graph_fuser.cpp#L875 I patched it in #83083,...

LayerNorm+CUDA+JIT

FYI, a simpler repro: ``` import torch import torch.nn as nn import torch.nn.functional as F class Cell(nn.Module): """Layer-normalized GRU as in https://arxiv.org/pdf/1607.06450.pdf https://github.com/pytorch/pytorch/issues/12482#issuecomment-440485163""" def __init__(self): super().__init__() self.layer0 = nn.LayerNorm(384) self.layer1...

hack ln implementation in convnext

> BUT with NVF as the torchscript backend, it appears to be optimizing such that the inplace ops violate grad rules `a view of a leaf Variable that requires grad...