Rui Wang
Rui Wang
Hi, I was trying to test `https://github.com/NVIDIA/apex/tree/master/apex/contrib/openfold_triton` with triton but encountered this error and cannot find the solution anywhere. It'd be great if I could get some pointers to check...
Hi, We are testing our new Hopper machines (H800/H100) and trying to use fp8 for training for the first time, but are having trouble installing `TransformerEngine`. It reports ` RuntimeError:...
Saving the output of the normalization instead of the input reduces the memory cost in modern networks, where the output is going to be saved anyways (e.g., a Linear layer)...
When I tested with the following snippet: ```python M = 128 N = 128 K = 128 slice_ = slice(47, 54) def to_float8_e4m3fn(x: torch.Tensor): scales = x.abs().amax(dim=-1, keepdim=True).float().div(FP8_e4m3_MAX) x =...
I find `jaxtyping` a blast to use, as is `tensordict`(https://github.com/pytorch/tensordict). If I could have it both ways, that would be even more amazing! Thanks for the great work!
As I understand it, headdim is more important than the number of heads, and the diff transformer chooses to half the number of heads and double the vdim compared to...
Hi, I'm trying to build the backward of a bias-enabled attention with cudnn and got ``` [cudnn_frontend] Error: No execution plans support the graph. ``` The snippet is as follows...