cutile-python
cutile-python copied to clipboard
[FEA]: NVSHMEM Support
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request?
Critical (currently preventing usage)
Please provide a clear description of problem this feature solves
Is NVSHMEM integration planned for cuTile? Lack of NVSHMEM support prevents kernels from performing fine-grained, in-kernel communication, limiting compute–communication overlap.
Feature Description
Support to NVSHMEM device APIs (e.g. nvshmemx_putmem_block)
Describe your ideal solution
@ct.kernel def vector_add(a, b, remote_c, tile_size: ct.Constant[int], pe: ct.Constant[int]): # Get the 1D pid pid = ct.bid(0)
# Load input tiles
a_tile = ct.load(a, index=(pid,), shape=(tile_size,))
b_tile = ct.load(b, index=(pid,), shape=(tile_size,))
# Perform elementwise addition
result = a_tile + b_tile
# Store result using NVSHMEM
ct. nvshmemx_putmem_block(remote_c, result, index=(pid,), pe=pe)
Describe any alternatives you have considered
No response
Additional context
No response
Contributing Guidelines
- [x] I agree to follow cuTile Python's contributing guidelines
- [x] I have searched the open feature requests and have found no duplicates for this feature request
@mksit yes we do plan on supporting nvshmem some time in the future. We don't have a firm date, yet. We can keep this issue open and I'll post here when it is announced.