marsaev

Results 6 issues of marsaev

Where possible. Probably better to wait for CUDA 9.2 release for updated cuSolver functionality.

good first issue

Currently, if JAX backend was already initialized implicitly, calling `jax.distributed.initialize` will not fail, but distributed environment will not be formed correctly. I.e. : On process one: ``` >>> import jax...

enhancement

After lib handle is created I want to confirm that thread support level is the one that I've requested. However code below: ``` ucc_lib_params_t lib_params = {0}; lib_params.mask = UCC_LIB_PARAM_FIELD_THREAD_MODE;...

Currently we use active_set field to "emulate" p2p communication: ``` ucc_coll_args_t coll = {0}; coll.mask = UCC_COLL_ARGS_FIELD_FLAGS | UCC_COLL_ARGS_FIELD_ACTIVE_SET | UCC_COLL_ARGS_FIELD_FLAGS; coll.flags = UCC_COLL_ARGS_FLAG_COUNT_64BIT; coll.coll_type = UCC_COLL_TYPE_BCAST; coll.root = root_rank;...

Currently triggered_post for `EE CUDA STREAM` launches persistent kernel that orchestrates the collective. There are some usecases which i think can allow avoid using launching kernel - popular usecase is...

Since CUDA 11.7 new cudaStreamOps_v2 introduced which doesn't require kernel module parameter to be set (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEMOP.html#group__CUDA__MEMOP). This lower requirements on using these APIs. As far as i understand this will...