Federico Busato comments

Results 65 comments of


                                            Federico Busato

SpMV with different matrix and vector types

cuSPARSE SpMV is limited to real and complex data types and doesn't support custom operators. We plan to add JIT LTO to SpMV (similar to SpMMOp) in the future to...

jax.scipy.sparse.linalg.cg inconsistent results between runs

Hi @tlu7, > Shall I assume that all these algorithms are available of cuda 11.2 and onwards. Is there any document that I can find this information? I need the...

jax.scipy.sparse.linalg.cg inconsistent results between runs

There is a small trick that you can use to check old toolkit documentations 😀 [https://developer.nvidia.com/cuda-toolkit-archive](https://developer.nvidia.com/cuda-toolkit-archive) `CUSPARSE_SPMM_CSR_ALG3` and SpMV algorithms have been introduced in CUDA 11.2u1 [https://docs.nvidia.com/cuda/archive/11.2.1/cusparse/index.html#cusparse-generic-function-spmm](https://docs.nvidia.com/cuda/archive/11.2.1/cusparse/index.html#cusparse-generic-function-spmm)

cusparse<t>gemvi() examples

Adding this example has a low priority at the moment. However, if you think this could be useful for other users, your contribution would be greatly appreciated.

Implement `<cuda/std/bitset>`

Please consider to use `uint32_t` for the storage type if it is allowed by the C++ specification https://github.com/NVIDIA/cccl/blob/main/libcudacxx/include/cuda/std/detail/libcxx/include/bitset#L151. 64-bit operations are less efficient on gpu architectures

[EPIC] Roadmap for cuda/memory_resource

our RFE: - `deallocate/deallocate_async` functions should accept `const void*` to skip `const_cast()` on the user side - Allow `cuda::mr::*` functions in device code - Clarify (or fix) the expected behavior...

[EPIC] Roadmap for cuda/memory_resource

> Can you elaborate on what you mean? allocate() and deallocate() are expected to always be synchronous. Yes, but what is their purpose if the code uses an `async_resource` with...

[EPIC] Roadmap for cuda/memory_resource

ok, I didn't interpret `async_resource` as a superset of the `resource concept`. In this case, can we please just clarify this point on the doc?

[FEA]: Optimize Complex FMA by exploiting lazy evaluation

I perfectly understand this constraint. It would be nice to add `cuda::complex` type if it is not too much effort.

[FEA]: Run tests through `compute-sanitizer` in CI

@alliepiper FYI I opened another issue for host-side sanitizers https://github.com/NVIDIA/cccl/issues/2241