Mike Ruberry issues

Results 12 issues of


                                            Mike Ruberry

Feature: a consistent Python and C++ logging facility that handles different classes of warnings

Today in PyTorch there are a variety of logging and warning types, facilities and feature requests. A partial list: - determinism warnings, controlled with [set_deterministic_debug_mode](https://pytorch.org/docs/master/generated/torch.set_deterministic_debug_mode.html#torch.set_deterministic_debug_mode) - deprecation warnings, which rely...

module: logging

triaged

Feature request: slice prim

per title

mruberry

complex float tanh is inaccurate

``` a = torch.tensor((0.0011-1.5705j,), device='cuda', dtype=torch.complex64) fs = Fusion() with FusionDefinition(fs) as fd: nv_a = fd.define_tensor(sizes=a.shape, strides=a.stride(), dtype=DataType.ComplexFloat) result = fd.ops.tanh(nv_a) fd.add_output(result) nv_result = fs.execute((a,))[0] torch_result = torch.tanh(a) assert_close(nv_result, torch_result)...

mruberry

Complex Support

complex float pow is numerically inaccurate

``` a = torch.tensor([[ 6.0674-5.0972j, 8.6904+2.0785j, 7.2375+8.7725j, 7.1124-8.8085j], [-4.8545+7.9547j, 2.3822+0.6237j, 0.7494+1.1833j, -5.3386-8.9542j], [-1.3619-6.0172j, -6.9431+0.0722j, 6.5147+8.0001j, 3.9272+5.2276j], [-6.6476-3.5998j, 2.2368+6.9990j, -6.5893+2.6003j, -5.6468+7.0181j]], device='cuda:0') b = torch.tensor([[ 7.9473-5.3537j], [ 2.7077-6.3395j], [-4.2864-5.1915j], [-0.4386+1.1773j]], device='cuda:0')...

mruberry

Complex Support

var op doesn't support symbolic number for "correction" kwarg

``` a = make_tensor((4, 4), device="cuda", dtype=torch.float32) fs = Fusion() with FusionDefinition(fs) as fd: nv_a = fd.define_tensor(sizes=a.shape, strides=a.stride()) nv_correction = fd.define_scalar(DataType.Int) result = fd.ops.var(nv_a, [1,], correction=nv_correction) fd.add_output(result) nv_result = fs.execute([a,...

Consider if functions like full_like and expand_as should be symbols (or not)

Functions like `full_like` and `expand_as` are symbols that accept tensors, only to dump the tensor's metadata and call another option. We should consider making these non-symbols to simplify transform logic,...

operators

benchmarking — create a notebook showing how to work with the single gpu benchmarks

The notebook should - show how to understand what the benchmark's options are - show how to programmatically run the benchmark - show how to create the callable that the...

documentation

benchmarking

test_core_vs_torch_consistency_pow has flaky variants

Update: A closely related test also failed for with the int8 dtype: ``` 2024-09-11T14:23:20.0323791Z =================================== FAILURES =================================== 2024-09-11T14:23:20.0324877Z [31m[1m______ test_core_vs_torch_consistency_pow_torch_cuda_thunder.dtypes.int8 _______[0m 2024-09-11T14:23:20.0325756Z [gw5] linux -- Python 3.10.12 /usr/bin/python3.10 2024-09-11T14:23:20.0326075Z 2024-09-11T14:23:20.0326629Z...

nvfuser

testing

dynamo+thunder

torch.sign is divergent from numpy.sign on NaN

``` torch.sign(torch.tensor(float('nan'))) : tensor(0.) np.sign(float('nan')) : nan ``` cc @mruberry @rgommers

triaged

module: numpy

[Profile-Guided Optimization][Feature] Let practitioners specify a trade-off between speed and memory when selecting backends for FX graphs

fyi @kiya00 Currently our profile-guided optimization attempts to pick a backend for each FX graph by looking at speed or memory use. We should consider letting practitioners specify a function...

profile guided optimization