Lowering `StridedMemoryView` attribues to typed efficient C/C++/Cython accessible values
Currently the attributes of StridedMemoryView are as follows:
https://github.com/NVIDIA/cuda-python/blob/8c841cdb24f64e65138cd2658d30fdeabd18769b/cuda_core/cuda/core/experimental/_memoryview.pyx#L24-L32
There is a todo in the code noting that this is worth converting to Cython types. Would also support this recommendation
When accessing things like shape or strides in Cython, one ideally wants a C array type of some form (pointer, typed-memoryview, dynamic array, etc.) that can easily be iterated over in C for-loops. As these attributes are currently accessing them requires calling the CPython API to get the length, each value, coerce them to C friendly types, etc.
This is especially important for things like ptr, which gets accessed regularly. So having a fast access C-type really helps
If we look at the Python Buffer Protocol (PEP 3118), they have the following definition for Py_buffer (their equivalent type):
typedef struct {
void *buf;
PyObject *obj; /* owned reference */
Py_ssize_t len;
Py_ssize_t itemsize; /* This is Py_ssize_t so it can be
pointed to by strides in simple case.*/
int readonly;
int ndim;
char *format;
Py_ssize_t *shape;
Py_ssize_t *strides;
Py_ssize_t *suboffsets;
void *internal;
} Py_buffer;
They also require users to call PyObject_GetBuffer to produce a buffer object and PyBuffer_Release to release a buffer object. This handles any memory allocation/deallocation for shape, strides, etc.. It also handles refcounting for obj. This functionality is wonderful to use
The lack of these semantics has made working with DLPack a chore
Thinking about how to map the C-like struct above to Cython/Python. A few things stick out
Some of these Cython can translate between Python/Cython/C like Py_ssize_t (effectively ssize_t in C (PEP 353)) to Python int's when needed.
Others can be coerced like char* to bytes (though usually one will want to decode/encode to/from str)
Still others could be translated well by Cython as long as they are typed appropriately. For example void* doesn't translate well to Python. However uintptr_t does behave like a pointer in C (sometimes with a cast) and like a Python int. Cython will handle the translation for us. Similarly bint for readonly works better when capturing Python's bool semantics while still working in C.
With a typed memoryview, it is possible to wrangle Py_ssize_t* into something better behaved like Py_ssize_t[::1], which can then move more easily between Python & Cython.
Also note that format above is a string specifying the format type according to the Python Buffer Protocol. NumPy is also able to consume and produce such format strings
This led RAPIDS to this approach:
cdef class Array:
cdef readonly uintptr_t ptr
cdef readonly bint readonly
cdef readonly object obj
cdef readonly Py_ssize_t itemsize
cdef readonly Py_ssize_t ndim
cdef Py_ssize_t[::1] shape_mv
cdef Py_ssize_t[::1] strides_mv
cdef readonly bint cuda
If StridedMemoryView is used with C/C++ directly, it may make sense to actually type a public C struct in Cython. Then this could be leveraged in C/C++ code that can hand such objects to or receive them from CUDA-Python. For example
# filename: strided_memory_view.pxd
from libc.stdint cimport uintptr_t
cdef public struct CStridedMemoryView:
uintptr_t ptr
ssize_t* shape
ssize_t* strides
char* format
int device_id
bint device_accessible
bint readonly
void* obj
cdef class StridedMemoryView:
CStridedMemoryView data
// filename: my_program.c
#include <stdbool.h>
#include <stdint.h>
#include <sys/types.h>
#include "strided_memory_view.h"
static const char* s = "Hello World!";
static const ssize_t s_len = 13;
int main() {
CStridedMemoryView st;
st.ptr = (uintptr_t)s;
st.shape = &s_len;
st.readonly = true;
// ...
return 0;
}
Though there can be other valid ways to go
On the Oct 23 meeting we discussed and agreed that this is an important feature to support. The information provided by StridedMemoryView should be host-/device- accessible. Temporarily slating this for the beta 3 release.
Unfortunately due to the code freeze approaching and other commitments I need to push this to the next release.