[EPIC] Roadmap for cuda/memory_resource
cuda::mr is intended to be the future of heterogenous memory allocation in CUDA C++. It is inspired heavily by lessons learned in RMM and our experience with the device_memory_resource* and friends. cuda::mr does not seek to replace RMM, but instead distill and standardize the best parts of RMM into a more central location. Furthermore, RMM is already in the process of rebasing on top of using the cuda::mr interface.
What we have today is the cuda/memory_resource header that provides
-
[async_]resourceconcepts - Property system
-
[async_]resource_refpolymorphic type
In essence, this just provides the top-level interface for memory allocation and defining properties of the allocated memory.
### Implementation plan
- [ ] https://github.com/NVIDIA/cccl/issues/2128
- [ ] https://github.com/NVIDIA/cccl/issues/2129
- [ ] https://github.com/NVIDIA/cccl/issues/2130
- [ ] https://github.com/NVIDIA/cccl/issues/2131
- [ ] https://github.com/NVIDIA/cccl/issues/2143
- [ ] https://github.com/NVIDIA/cccl/issues/2132
### Misc
- [ ] Add NVTX annotations to all memory resources
- [ ] https://github.com/NVIDIA/cccl/issues/2313
### Concrete types that satisfy the C++ allocator requirements
- [ ] A `cuda::mr::allocator<T, Properties...>` capable of preserving concrete type of the resource (no type-erasure)
- [ ] A `cuda::mr::polymorphic_allocator<T, Properties...>` constructible from a `resource_ref<Properties...>`
Questions we'll need to answer along the way:
- What lifetime semantics do we want to use for resources + allocators + data structures?
- RMM took a very relaxed approach of using non-owning references everywhere, but this is worth reconsidering (see https://github.com/rapidsai/rmm/issues/1492
- Do all data structures only take Allocators? Or just resources? Both?
- In RMM, we took an approach of only constructing from resource_refs directly, but this was mostly for expediency and convenience, so it is worth reconsidering.
@harrism You might be interested in this
Is there a long-term plan to pull more of the concrete implementations from rmm into CCCL? That seems like the best way to broaden adoption and usage of these allocators and would satisfy some of the new features mentioned above IIRC.
@vyasr yes I believe we want to pull some of the foundational features into cccl. Definitely not all but some
Is there a long-term plan to pull more of the concrete implementations from rmm into CCCL? That seems like the best way to broaden adoption and usage of these allocators and would satisfy some of the new features mentioned above IIRC.
Yes, that is what we mean by "Concrete types that satisfy the resource and async_resource concepts"
our RFE:
-
deallocate/deallocate_asyncfunctions should acceptconst void*to skipconst_cast<T*>()on the user side -
Allow
cuda::mr::*functions in device code -
Clarify (or fix) the expected behavior of
allocate()deallocate()forasync_resource- Personal thought: remove
_async()version of the API and add the stream toallocate/deallocate
- Personal thought: remove
Clarify (or fix) the expected behavior of allocate() deallocate() for async_resource
Can you elaborate on what you mean? allocate() and deallocate() are expected to always be synchronous.
Can you elaborate on what you mean? allocate() and deallocate() are expected to always be synchronous.
Yes, but what is their purpose if the code uses an async_resource with _async() API. They look redundant and confusing in this case
Yes, but what is their purpose if the code uses an async_resource with _async() API. They look redundant and confusing in this case
The thinking is that the async_resource concept is a strict superset of the resource concept. This way, if you have an async_resource object, you can still conveniently pass it to a function that expects a resource.
ok, I didn't interpret async_resource as a superset of the resource concept. In this case, can we please just clarify this point on the doc?