Christian Sarofeen comments

Results 41 comments of


                                            Christian Sarofeen

Cooperative groups support.

Good point on the whitespace. I'll start working on an example.

NVFuser JIT eager mode InstanceNorm3d

You're correct @ngimel. We can end up generating many kernels, if we want to limit the number of kernels we generate we would need implement coarser grained heuristics (definitely possible...

NVFuser JIT eager mode InstanceNorm3d

The highest level that Eddie was working on, or the heuristics? Heuristic recompilations are all dependent on heuristic changes and subject to change from one release to the next. Practically...

Add CUDA Graph and AOT Autograd support

FYI I intend to review (can't set myself as a reviewer)

hack ln implementation in convnext

> cc @csarofeen for regressions in backward, my understanding was that (at least for not-channels-last) aot is a win. I haven't seen significant regressions in backwards except in channels last...

hack ln implementation in convnext

We explicitly tested on 1.12 release, CC @ptrblck and @kevinstephano in case we were testing something slightly different. Definitely keep us posted, we're highly motivated to get our codegen in...

hack ln implementation in convnext

I did a sweep of LayerNorm FWD and BWD on the sizes I generally use for my "TIMM micro benchmarks": Product of: N [8, 16, 32, 64, 128, 256] C...

hack ln implementation in convnext

Benchmark changes for what it's worth: https://github.com/csarofeen/pytorch/pull/1833

hack ln implementation in convnext

I think it would perform marginal gains, but yeah, I'm trying to think what XLA is doing that's so amazing, or what are we doing that's so bad that nvFuser...

hack ln implementation in convnext

If Apex LN is working for you go for it, it's disappointing because it's highly unlikely the big perf difference is because of the code generated but the integration in...