dlpack [ABI break] Add new structs with version info and readonly flag

Closes #34 Supersedes #72

This PR adds version info to the DLTensor struct. This is under discussion in #34 (see this comment). Since this is an ABI break, the ABI version has been incremented to 2 and the DLPack Version has been bumped to v0.7.0. This also updates the Python spec with a detailed upgrade policy and future ABI compatibility. The upgrade policy has been designed such that the libraries that want to keep supporting (exporting and importing) the old ABI or support multiple DLPack and ABI versions can do so without breaking compatibility with the old Python API.

Mar 25 '22 12:03 tirthasheshpatel

An ABI break may also be an opportunity to reexamine the signed/unsignedness of all DLTensor members (PR #102).

Apr 02 '22 10:04 wjakob

Thanks for the help so far everyone! I think the discussion on #34 has settled on adding the __dlpack_info__ method which returns the max version supported by the producer. I have updated the PR with the new spec (that includes __dlpack_info__). I haven't changed capsule renaming yet. @leofang Let us know if remaining the capsule makes sense to you when the new ABI is exported (as mentioned in #34, it provides some extra protection without breaking backward compatibility).

Apr 04 '22 06:04 tirthasheshpatel

I originally was in favor of capsule renaming, but I think it is not needed given the consensus around the protocol. While it does provide another layer of safety, I think it is redundant.

Apr 04 '22 06:04 mattip

There are still a number of things I find confusing about the Python specification. Some of them are related to this specific PR. Some of them are things that are unclear in general.

The stream argument of __dlpack__() can presumably be used to pass a CUDA stream object. Is the interface specific to CUDA? If so, the documentation should probably say so.

CUDA streams are not a natural representation in CUDA. Even in C++, they are simply a pointer to an opaque NVIDIA data structure. Are they passed as capsules, uintptr_t cast to a python int, etc.? The documented interface lacks type annotations to make this clear.

The document mentions a new __dlpack_info__ function but does not provide a clear type-checkable signature of this new function. This is a recipe for inconsistencies. I recommend providing an example raw python version with MyPy-style type signature as an implementation guide.
What are the requirements on the version attribute of __dlpack__? Is it a Python int? If so, this could be stated in the type signature.

Apr 04 '22 07:04 wjakob

The document mentions a new __dlpack_info__ function but does not provide a clear type-checkable signature of this new function. This is a recipe for inconsistencies. I recommend providing an example raw python version with MyPy-style type signature as an implementation guide.

What are the requirements on the version attribute of __dlpack__? Is it a Python int? If so, this could be stated in the type signature.

Yeah, making the signatures explicit should make it easier to understand the spec, thanks.

The stream argument of __dlpack__() can presumably be used to pass a CUDA stream object. Is the interface specific to CUDA? If so, the documentation should probably say so.

stream is an integer; the ID of the GPU device where the data is present. It is specific to CUDA and ROCm (and the devices that use a stream mechanism). This is explicitly stated in the semantics section of the python specification. Is it not clear enough?

Apr 04 '22 15:04 tirthasheshpatel

I originally was in favor of capsule renaming, but I think it is not needed given the consensus around the protocol. While it does provide another layer of safety, I think it is redundant.

Okay, I have removed capsule renaming now.

Also, I have added release notes and updated the DLPack diagram with the new structs. This PR is ready for review again from my side.

Apr 05 '22 12:04 tirthasheshpatel

@tirthasheshpatel just realized that we checkedin the png image to the repo in a previous path. Is it possible to checkin the image to a different location/repo? On one hand we would like to keep repo self-contained, on the other hand, it can be a bit annoying for a repo to contain a rich history of binaries through multiple revisions.

One approach is to send a PR to a separate repo instead (we used https://github.com/dmlc/web-data) for some of that purposes then link to via https://gitcontent

Apr 05 '22 12:04 tqchen

One approach is to send a PR to a separate repo instead (we used https://github.com/dmlc/web-data) for some of that purposes then link to via https://gitcontent

Sounds good! I will remove the image here and propose a PR on dmlc/web-data.

Apr 05 '22 13:04 tirthasheshpatel

https://github.com/dmlc/dlpack/issues/104

Apr 05 '22 13:04 tqchen

stream is an integer; the ID of the GPU device where the data is present. It is specific to CUDA and ROCm (and the devices that use a stream mechanism). This is explicitly stated in the semantics section of the python specification. Is it not clear enough?

I actually find this part of the documentation quite confusing (That said, I am only familiar with the CUDA part and can't speak about ROCm).

There are three concepts in CUDA that play a role here:

Each compatible graphics card/accelerator in a machine is given CUDA device ID.
An application can talk to such a CUDA device by creating a CUDA context (details). The CUDA runtime API creates a default context (primary context), but this is not necessarily only one. An application could in principle create multiple contexts to launch applications on the same GPU, each with its own virtual address space on the GPU (meaning that a memory allocation in one context cannot be accessed by another context).
In each context, the application can create multiple CUDA streams (details). A CUDA stream has the purpose of ordering kernel launches with respect to each other. It does not play a role with regards to the residency of memory allocations. (Although, to make this even more complicated, the latest CUDA versions now also have what is called stream-ordered memory allocations,I digress...)

Of these three, only the CUDA device ID is representable as an integer. Both CUDA context and CUDA stream are represented by opaque handles to a proprietary data structures, in both CUDA runtime and driver APIs. Essentially a void* pointer.

I think that there is I think a fundamental confusion between the word "stream" and "CUDA context" or "CUDA device ID" in the DLPack documentation. I suspect that will you want to call this "device ID" and not "stream" and also establish the convention that CUDA memory allocations are assumed to be located in the primary context associated with the associated CUDA device ID.

There is one situation in which the notion of a stream would make sense to me: if the tensor is not yet computed and the __dlpack__ function needs to perform a kernel launch to actually compute the data that will then be wrapped. In that case, we might want this computation to be ordered with respect to other computation performed by the caller (which might be issued to the same context+stream pair). However, in this case, providing the stream as an int would not be a suitable interface.

Apr 05 '22 20:04 wjakob

closed in favor of #113

We would like to thanks @tirthasheshpatel for your effort on bringing in this change onward

Jan 05 '23 18:01 tqchen