Leo Fang

Results 1110 comments of Leo Fang

> and PyTorch evolving the existing one Isn't it because PyTorch has the tradition to break compatibility in *every single* release, so they can be as adventurous as they want?...

Just hit this issue yesterday and was wondering about it 🙂

Thanks for quick reply, Wesley! So I took a quick look at the namespace convention of libcu++, do you mean the support of half complex will likely live in `cuda::device::`...

I encounter the same issue when working on the Python support (cupy/cupy#4228). In Python, in principle we could intercept Jitify's output by hijacking `stdout`'s file descriptor (I actually made this...

Even better: append the log to the raised error messages, so that when we capture it in Cython/Python, we can access the log without any file or stream I/O.

Hi @Robadob, I also identified this performance issue in CuPy a while ago. If you're still on Jitify v1, and if the slowdown you're seeing happens every time a new...

Hi @cliffwoolley, what if on the conda-forge side we compile NCCL using CUDA 11.3, but still allow dynamic cudart? Would it work? Because if so that would solve both this...

(But I agree, the conda-forge NCCL is deviating from the NV libraries' standard practice. In our defense, though, linking to cudart_static is not the proper conda-forge practice as per [CFEP-18](https://github.com/conda-forge/cfep/blob/main/cfep-18.md)...)

@njzjz just curious, which CUDA version did you use to build your NCCL?