Mario Lezcano Casado
Mario Lezcano Casado
There's still at least one xfail that needs to be removed (there's an "unexpected success" in a test) but otherwise this is ready to go!
@pytorchbot merge
That code you found is from caffe. I don't think that code is tested in CI. So, it seems that it was passing on CUDA (and MPS I guess as...
It also passes on CUDA. See https://github.com/pytorch/pytorch/actions/runs/4004960425/jobs/6876076243 (or see how there are no failing CUDA jobs when you removed the xfail).
@pytorchbot merge
These are benchmarks on different shapes of a softmax: This PR: ``` (1, 67108864) inductor: 857.4533462524414 us (2, 33554432) inductor: 858.1042289733887 us (4, 16777216) inductor: 850.0027656555176 us (8, 8388608) inductor:...
Running the command in https://github.com/pytorch/pytorch/pull/91316#issuecomment-1363421509, it seems that this patch is terrible. It needs further investigation on how to make it scale properly.
Closing this one, as this will probably be superseded by the `autotune-max` option. The code in this PR that gives a loose bound on the number of registers needed to...
I think it would be good to wrap this into one umbrella issue that has a long list or points to one place with a list of these to avoid...
@thomasjpfan did some experiments with this PR and the compat layer, and it seems like there are still quite a few things that should be sorted. See [this notebook](https://gist.github.com/thomasjpfan/513115f8c6265b83c9fe69ec9f02f11a).