Rui Wang issues

Results 7 issues of


                                            Rui Wang

Fatal Python error: Aborted

Hi, I was trying to test `https://github.com/NVIDIA/apex/tree/master/apex/contrib/openfold_triton` with triton but encountered this error and cannot find the solution anywhere. It'd be great if I could get some pointers to check...

Installation failed with cmake error

Hi, We are testing our new Hopper machines (H800/H100) and trying to use fp8 for training for the first time, but are having trouble installing `TransformerEngine`. It reports ` RuntimeError:...

Massively reduce LayerNorm/RMSNorm training memory usage by sharing saved tensor with other parts of the networks

Saving the output of the normalization instead of the input reduces the memory cost in modern networks, where the output is going to be saved anyways (e.g., a Linear layer)...

scaled_matmul.cu produces wrong results by loading the scaling factors wrong

When I tested with the following snippet: ```python M = 128 N = 128 K = 128 slice_ = slice(47, 54) def to_float8_e4m3fn(x: torch.Tensor): scales = x.abs().amax(dim=-1, keepdim=True).float().div(FP8_e4m3_MAX) x =...

Is it possible to be used with `tensordict`?

I find `jaxtyping` a blast to use, as is `tensordict`(https://github.com/pytorch/tensordict). If I could have it both ways, that would be even more amazing! Thanks for the great work!

question

Ablation tests with the same headdim as v with differential transformers

As I understand it, headdim is more important than the number of heads, and the diff transformer chooses to half the number of heads and double the vdim compared to...

`[cudnn_frontend] Error: No execution plans support the graph. ` with dBias in attention

Hi, I'm trying to build the backward of a bias-enabled attention with cudnn and got ``` [cudnn_frontend] Error: No execution plans support the graph. ``` The snippet is as follows...