Leo Fang comments

Results 1110 comments of


                                            Leo Fang

Array API implementation

> and PyTorch evolving the existing one Isn't it because PyTorch has the tradition to break compatibility in *every single* release, so they can be as adventurous as they want?...

CUDA's `par_nosync` execution policy missing from documentation

Just hit this issue yesterday and was wondering about it 🙂

Support for `std::complex<__half>`?

Thanks for quick reply, Wesley! So I took a quick look at the namespace convention of libcu++, do you mean the support of half complex will likely live in `cuda::device::`...

Jitify should not print warnings to stdout by default

I encounter the same issue when working on the Python support (cupy/cupy#4228). In Python, in principle we could intercept Jitify's output by hijacking `stdout`'s file descriptor (I actually made this...

Jitify should not print warnings to stdout by default

Even better: append the log to the raised error messages, so that when we capture it in Cython/Python, we can access the log without any file or stream I/O.

load_program() performance (with large include hierarchies)

Hi @Robadob, I also identified this performance issue in CuPy a while ago. If you're still on Jitify v1, and if the slowdown you're seeing happens every time a new...

[WIP] Implement some CUDA intrinsics with `@overload`, `@overload_attribute`, and `@intrinsic`

This is so much nicer!

ncclCommInitRank failed: unhandled cuda error

Hi @cliffwoolley, what if on the conda-forge side we compile NCCL using CUDA 11.3, but still allow dynamic cudart? Would it work? Because if so that would solve both this...

ncclCommInitRank failed: unhandled cuda error

(But I agree, the conda-forge NCCL is deviating from the NV libraries' standard practice. In our defense, though, linking to cudart_static is not the proper conda-forge practice as per [CFEP-18](https://github.com/conda-forge/cfep/blob/main/cfep-18.md)...)

ncclCommInitRank failed: unhandled cuda error

@njzjz just curious, which CUDA version did you use to build your NCCL?