dlpack [DISCUSS][RFC] DLPack Versioning and ABI Update

This is a thread to bring awareness and discuss the recent set of proposed change of adding versioning information to DLPack.

Summary

Up until now the DLPack C struct itself does not carry ABI version information, and we always maintain backward compatibility. Up until now there is no ABI breaking changes.

The overall stability is one reason why frameworks adopt DLPack in the first place and we would like to continue to do moving forward. In the meantime, it is indeed helpful to clarify the DLPack ABI version in the data structure itself. contains a draft proposal for the change. We can still

In short, we are looking into the possibility of attaching version information to the DLTensor struct, to minimize the ABI change, the version information is appended in the end of the current DLTensor

Summary of key changes

C struct: append version struct in the end of current DLTensor, we can still discuss the specific impl of version struct. There is also a possibility of readonly flag/mask in light of the change
Python API: introduce __dlpack_info__ function that returns the current supported API/ABI version.

Compatibility implications

One thing that is worth clarifying is that whether this constitute a ABI breaking change. It depends on how people use it. Let us clarify the following scenarios.

S0: Frameworks that consume the old DLPack ABI should be able to continue consume the new DLTensor and DLManagedTensor by ignoring the fields
- Specifically, the memory layout of the existing fields remains the same, so old access code should work out of the box
- The deleter signature remains the same, which means call into the deleter of DLManagerTensor(where deleter is defined by producer that produces the new DLTensor) should safely delete the new structure.
S1: Framework that produces old DLPack ABI Tensor can be consumed by the new framework, if the new framework queries the existence of __dlpack_info__ then do the conversion according
- Note the catch is that if a new framework was handled an arbitrary (old) DLTensor, querying the ABI field will result in an undefined behavior. So the guard on the python side exporter is needed.
S2: Interestingly, the old framework can include thew new DLTensor/DLManagedTensor struct, not updating any of its exchange implementations. From consumer pov, they can receive and use DLTensor as if it is an old struct.
- The deleter will ensure proper deletion.
- It is fine as a result for framework to "soft update" to include the new header, but not updating __dlpack_info__ to indicate that it is already at a new version

Normally S0 means that the data structure ABI is still backward compatible (new data structure being used in old scenarios). S1 somewhat seats in the future-compatible regime(old data structure being used in new scenarios).

Related threads

previous discussion https://github.com/dmlc/dlpack/issues/34
https://github.com/dmlc/dlpack/pull/101 PR draft

What to Expect

This is a notice and discussion thread to let the community that this is happening. It would be great to get folks to also tag related framework communities, so the ecosystem is fully aware of(and ideally buy-in) the proposed change before we formally proceed.

Given this change does have compatibility implications (although not breaking depending on how we see it), we will have a longer period of NOTICE(expect 1-2 months) before we proceed. Please tag as many folks you think are related as possible.

In the future, we also suggest to open thread of this kind to discuss the compatibility implication of the changes, if any.

Apr 05 '22 13:04 tqchen

cc @rgommers @wookey @tirthasheshpatel @junrushao1994 @jermainewang @leofang @kkraus14 @szha @hawkinsp

Apr 05 '22 13:04 tqchen

what should downstream frameworks expect in the way of migration functions? should folks proposing an ABI-breaking change here also be on the hook to provide an implementation that produces a new DLTensor struct given an old one? it could be helpful to provide such implementation to clarify how to interpret such older structs.

Apr 05 '22 15:04 areusch

Thanks @areusch this is indeed something that worth covering. The compatibility implication section covers some of that

The frameworks are "safe" to simply use the new struct without changing the API due to S0(backward compatibility)
There should be new API update to introduce __dlpack__info__ at which point the new fields needs to be filled properly.

It is a bit ambiguous to call it ABI-breaking(due to S0) but would be good to list out all the implications

Apr 05 '22 15:04 tqchen

one concern with placing the version information at the end of the struct is that, in a V1 DLTensor, that field is invalid. therefore, the memory in that location could be set to any value. that begs the question whether we should require some magic, and how that magic should perhaps be computed.

i think adding the migration functions would help folks to consider these corner cases, which are not always obvious when writing these specs.

Apr 05 '22 16:04 areusch

I believe what you said is covered by S1 part, which I agree that some of the migration function(likely filling in the value) would be useful

Apr 05 '22 17:04 tqchen

i'm not sure i follow--how can a C impl trust this memory, since it's outside the boundaries of a DLTensor in v1 of this spec? even if __dlpack__info__ exists on the Python wrapper class, how could the impl know whether or not the DLTensor instance was produced by a V2+ framework (and therefore, trust the memory which holds the version info)?

Apr 05 '22 18:04 areusch

The protocol is covered in the PR to implement this, as documented here (summarized below). As I understand the proposal, if the producer has no __dlpack_info__, then by definition it is using V1: this is S0 above. If the consumer calls __dlpack__() with no version, then the producer should return a V1 C structure for backward compatibility, at least for the next few years: this is S1 above. Otherwise, as described in the protocol, the consumer requests a specific dlpack version, which then dictates the C structure returned by the producer.

PR protocol documentation

The array API will offer the following syntax for data interchange:

A from_dlpack(x) function, which accepts (array) objects with a __dlpack__ method and uses that method to construct a new array containing the data from x. __dlpack__(self, stream: int = None, version: int = None), __dlpack_info__(self), and __dlpack_device__(self) methods on the array object, which will be called from within from_dlpack, to access the data, to get the maximum supported DLPack version, and to query what device the array is on (may be needed to pass in the correct stream, e.g. in the case of multiple GPUs).

Semantics

DLPack describes the memory layout of strided, n-dimensional arrays. When a user calls y = from_dlpack(x), the library implementing x (the "producer") will provide access to the data from x to the library containing from_dlpack (the "consumer"). ...

A DLPack version can be requested by passing the version keyword. The consumer should call the __dlpack_info__ method to get the maximum DLPack version supported by the producer and request for a version both support e.g. min(producer_version, consumer_version). If the consumer doesn't support any version below the producer's maximum version, a BufferError should be raised. Similarly, If the producer doesn't support the requested version, it should raise a BufferError.

Apr 05 '22 19:04 mattip

@mattip thanks for the pointer. those protocol docs make sense when Python is in the loop.

I'm thinking of the following scenario--it could be something that doesn't apply to dlpack, so please correct me if I'm wrong:

A producer (C++) is compiled against dlpack created prior to this proposal landing
A consumer (C++) is compiled against dlpack created after this proposal lands
They are communicating by pointer which erases the type (e.g. PackedFunc in TVM-land passes DLTensor* as void*)

Now suppose the consumer is trying to work with the DLTensor created by the producer. Presumably this involves a flow in which a DLTensor* is cast into DLTensorVersioned* (e.g. via DLTensor* (from an old library not compiled with an up-to-date dlpack) -> void* (suppose this is done in order to pass via a PackedFunc in TVM-land) -> DLTensorVersioned*).

When the consumer (or the consumer's wrapping Python DLTensor class) tries to access the received DLTensorVersioned*, it would try to make a version check against version.dlpack, but the producer allocated a smaller struct (sizeof(DLTensor)), that memory is invalid.version.dlpack could be random.

A way around this could be to require that all DLTensor passed via C++ be allocated by the consumer, and the consumer could then zero-initialize the DLTensorVersioned and fill in the requested version. However, then that drags in some complexity around requiring that sizeof(DLTensorVersioned) only grows as version increases, so there are some drawbacks there too.

Apr 05 '22 20:04 areusch

Independent of the discussion here, my suggestion would be to let this RFC gather feedback for 1-2 months before making changes of the code. An ABI break for such an important data exchange protocol is serious. Having enough feedback will ensure that there are no regrets.

Apr 05 '22 20:04 wjakob

@wjakob thanks for the suggestion! That is indeed the goal of this RFC(to ensure a long enough period of time before we commit as we value stability), I made it clear in the post

Apr 05 '22 20:04 tqchen

They are communicating by pointer which erases the type (e.g. PackedFunc in TVM-land passes DLTensor* as void*)

Can you point to this code? At some point the producer must decide to allocate and fill in the C struct. That is the point where there needs to be a protocol for specifying a dlpack version. If the architecture of the producer disallows any such protocol, how can dlpack ever change?

Edit: I think the term used to describe such situations is "dependency hell".

Apr 06 '22 07:04 mattip

Some update after taking a closer look. Actually right now the proposed ABI did break S0 as well. The main reason was due to the fact that DLManagedTensor's deleter field seats right after a DLTensor. So any append of new fields would effectively change the location of deleter and break S0.

To further clarify the situation, let us think about the following possible changes to DLTensor fields:

C0: Append new fields to the DLTensor(e.g. readonly)
C1: Change some of the current field definition (that affects the layout of the current structure).

Due to our favoring of stability ideally C1 should be extremely rare. And C0 ideally should maintain backward compatibility(the new struct can be used safely in old settings -- ignoring the fields).

There are a few choices if we consider the need that "C0 should not break" ABI compatibility

A0: Place version in DLManagedTensor

struct DLManagedTensor {
   Version version;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);
   DLTensor dl_tensor;
};

Note that this is a ABI breaking change.

Importing and exporting won't be backward compatible
New importer needs to handle old exporting cases.
Note that deleter and manager context are placed before, so their location layout remains stable when field append to DLTensor happens

However, this allows future append to DLTensor fields without problem, so future C0 changes can be done in a backward compatible way.

A1: Place version and new flags in DLManagedTensor

struct DLManagedTensor {
   DLTensor dl_tensor;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);
   Version version;
   new_flags here
};

This is a ABI backward compatible change. It does bring restrictions

No changes to the current fields of DLTensor itself can be made
New flags are appended to DLManagedTensor, which also makes sense as properties such as read only can come with the managed exchange data structure, while the DLTensor core remains stable.
This shifts the burden of complexity to DLManagedTensor, with the assumption that core DLTensor should remain stable.

A2: A variant of A1 while allow upgrading DLTensor fields

struct DLManagedTensor {
   DLTensor dl_tensor;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);
   more flags
};

struct DLManagedTensorVersioned {
    Version version;
    DLManagedTensor managed_tensor;
}

In new version, allocate DLManagedTensorVersioned but still return the pointer to the managed_tensor.
During import, recover DLManagedTensorVersioned by subtracting the ptr offset to read the version information
Version info can be used to decide whether an ABI update is needed

Summary Discussions

Due to our favoring of stability. Ideally we would strongly favor S0 property (so frameworks do not need to bring extra support to keep up with the protocol), and the fact that DLPack rarely changes is an important property of the exchange protocol, of course, we do want to think about mechanisms to allow us to deal with breaking in case there is really need for it.

Apr 06 '22 15:04 tqchen

I am +1 for the A1 option. It is a minimalist option without severe backward-compatibility concerns. I have proposed this alternative in #105. @rgommers @mattip @seberg @leofang I'd appreciate your opinion on this!

Apr 15 '22 07:04 tirthasheshpatel

I would tend towards creating the new name (i.e. A2 or variations of it). Otherwise, appending the fields requires out-of-bound exchange of the information whether the field exists. We can do that in Python but it seems stranger in C? EDIT: OK, I guess it is not strange in C. I think my main feeling is that once the old version is not used anymore A2 seems simpler.

If a new name is created, new C code can indicate that by using DLManagedTensorVersioned in headers to give some type/ABI safety. How exactly the DLManagedTensorVersioned that should look like, I am not sure. Just a stub with the version or any eternally stable ABI fields (which is than cast to the specific versioned one). In the above DLManagedTensor is included, which also works. The downside is that at some point the "included" version may change, so I am wondering if it isn't clearer to force the user to cast. (The layout is still the same in both cases of course, the question is how to expose it in the headers.)

Apr 15 '22 09:04 seberg

If a new name is created, new C code can indicate that by using DLManagedTensorVersioned in headers to give some type/ABI safety.

How would one do that? I think the idea of A2 is to still export the managed_tensor field of DLManagedTensorVersioned struct and use offset of the version field to retrieve the version info.

Apr 15 '22 10:04 tirthasheshpatel

I don't know how current API looks like. I imagine exposing something like from_dlpack(DLPackManagedTensor *tensor) and DLPackManagedTensor *managed_tensor = to_dlpack(object).

And the only way to transition that, is probably to create a new pair of functions that use DLManagedTensorVersioned *? But if that is how the transition happens anyway, then I don't see much point in placing the version last. So yes, a current from_dlpack could remain compatible with newer versions for now. But for how long? If there is any ABI change in the future it will remain being compatible. so I am not sure that would actually give a very clear transition for when an ABI change happens. (It would assume that eventually no old users exist and then ABI break is possible?)

So, I feel that advantage of "ABI compatibility" with current from_dlpack is unclear. So I would preference doing what seems cleanest in the long term. And for me, that is placing the version first and renaming the struct.

Apr 15 '22 11:04 seberg

And the only way to transition that, is probably to create a new pair of functions that use DLManagedTensorVersioned *?

As far as C/C++ is concerned, I too think this is the only way to transition. Both A1 and A2 sound like good options then. I also see now how A2 can provide type safety. @tqchen The A2 option says "In new version, allocate DLManagedTensorVersioned but still return the pointer to the managed_tensor." I don't think we should do that since it takes the type safety advantage away (making A1 a simpler solution). Otherwise, I am happy to go with A2 too.

What do you think @mattip?

Let's wait for some others to chime in and then I can update the PR.

Apr 18 '22 04:04 tirthasheshpatel

I like A2 as well.

Apr 18 '22 08:04 mattip

One of the mentioned disadvantages of A1 is "No changes to the current fields of DLTensor itself can be made". How would this look for A2? (In general, it seems to me that adding any fields to DLTensor will be an ABI break.)

Apr 19 '22 19:04 wjakob

How would this look for A2?

We can add new fields onto DLTensor since the version is the first field in DLManagedTensorVersioned (as opposed to A1 where the version field comes after DLTensor).

Apr 20 '22 10:04 tirthasheshpatel

This would work for software that knows how to interpret a DLManagedTensorVersioned with that specific version, but it would still break older software that depends on a compatible layout, no? How is that not an ABI incompatibility?

Apr 20 '22 11:04 wjakob

The idea is to introduce new functions (to_dlpack(DLManagedTensorVersioned *) and from_dlpack(DLManagedTensorVersioned *)) that convert and export the tensor to a DLManagedTensorVersioned *. It's good to have for type safety but otherwise, A1 is simpler.

Apr 20 '22 11:04 tirthasheshpatel

The killer feature of DLPack is its simplicity: it's just a header file with a few data structures that are trivially ported to various languages. Frameworks that can exchange DLPack tensors don't need to link to a shared or static library to do so. They don't need even deal with C or C++. Introducing required versioning-related functions as part of this repository sounds like it has the potential to significantly complicate this success story. My 2cts...

(Edit: I am against versioning per se. I am just wondering what this means for ABI interoperability between software that may and may not implement this versioned extension.)

Apr 20 '22 11:04 wjakob

Without versioning, DLPack can't evolve. There are already requests for read-only data support, non-native endianness, etc. Adding those without a version to separate the new structure will lead to spurious breaks everywhere (since not everyone will update at the same time and newer functions won't work with older ones). I am not sure if there is a simpler way; after a lot of discussions, these options seem to be the simplest of all. In long term, we can remove the old structs and also update the spec to not care about the old ABI.

Introducing required versioning-related functions as part of this repository sounds like it has the potential to significantly complicate this success story.

The proposal is to add version-info in the struct to avoid this exactly. All the complications happen in the spec and not the DLPack header file...

Apr 20 '22 11:04 tirthasheshpatel

Hi everyone,

The issue has been up for almost a month. So far, I think @mattip, @seberg, and @leofang are in favor of versioning and @wjakob is against it. Let's wait until 16'th May for more feedback and, if there is none, it would be good to merge #105 and wait until 23'rd May before closing this to allow some time to revert the changes if needed. Please feel free to add your thoughts on how DLPack should proceed with versioning and ABI break. Thanks!

May 02 '22 09:05 tirthasheshpatel

this is a tricky situation which I suspect comes down to how everyone actually uses DLPack in practice. it'd be helpful to know more about everyone's usage of DLPack so we could evaluate whether we're all debating the same use case or if there are others.

my initial impressions of the proposals:

A0.

good: version is the first field
bad: because there is no way to signal a version incompatibility between v1/v2. A1.
good: is an organic evolution into a versioned world, avoids immediate backwards-compat problems
bad: splits fields across DLTensor and DLManagedTensor, making DLTensor append-only and making it difficult to change the semantics of DLTensor fields. it's a half-solution in my book. A2.
good: avoids API break, Version is the first field in a wrapper class to DLTensor to make interpretation easier once a pointer to DLManagedTensorVersioned is found.
bad: finding the pointer to DLManagedTensorVersioned requires assuming that it exists; would the API tell you this? DLManagedTensor cannot be used when the version check fails (e.g. you cannot deallocate it).

might I suggest A3, a variant of A2:

static constexpr const uint32_t kVersionedDLTensorMagic = 0xD1D1D1D1;

struct DLManagedTensor {
   DLTensor dl_tensor;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensor * self);    
};

// note: supersedes DLManagedTensor
struct DLManagedTensorVersioned {
   Version version;
   void * manager_ctx;
   void (*deleter)(struct DLManagedTensorVersioned* self);    
   uint32_t magic; // set to kVersionedDLTensorMagic ^ (((uintptr_t) &self->dl_tensor) - ((uintptr_t) self))
   DLTensor dl_tensor;
}

here is what i think example usage might look like:

allocate DLManagedTensorVersioned (or invoke other library which may return either DLManagedTensor* or (DLManagedTensorVersioned*)->dl_tensor. Note that this amounts in both cases to returning a pointer to a DLTensor instance.
The client library's unknowns at this point are:
- The field layout of DLTensor
- The location of manager_ctx and deleter
- The version of the tensor
Perform the following algorithm to recover the version:
1. Suppose the returned pointer is void* caller_tensor.
2. Compute ptrdiff_t version_ptrdiff = *((int32_t*) caller_tensor )- 1) ^ kVersionedDLTensorMagic, which may be the difference between the address of DLManagedTensorVersioned.dl_tensor and DLManagedTensorVersioned. We could also enforce here that version_ptrdiff is e.g. < 0x100.
3. Compute DLManagedTensorVersioned* candidate_versioned = ((uintptr_t) caller_tensor) - version_ptrdiff;
4. Check if candidate_versioned->version is supported and validate version_ptrdiff based on knowledge of that version (e.g. v2 expects version_ptrdiff == 16).
5. If version does not match, compute DLManagedTensor dl_managed_tensor = caller_tensor and treat as v1.
Use per semantics defined in the version. To deallocate, invoke deleter(self), where those fields are looked-up as is apropos given version.

the advantage to this is that the managed fields are moved next to the version, so that they could be used to terminate the lifecycle of a tensor whose version is incompatible with the client library. this scheme is of course vulnerable to various memory tricks as well--you could consider patching Version to include some xor'd magic too. it is more complex, though. thoughts?

May 11 '22 19:05 areusch

it'd be helpful to know more about everyone's usage of DLPack

That is the hard part here, and why @tirthasheshpatel submitted #106. Other users of DLPack can add additional use cases. Without concrete use cases, it is difficult to relate to "what if" scenarios.

To me the A3 proposal seems too "magical". The use of __dlpack_info__ in A2 allows the provider to explicitly state what version of the API they will support, answering "would the API tell you this?" with "yes".

@wjakob when you say

Introducing required versioning-related functions as part of this repository sounds like it has the potential to significantly complicate this success story

do you agree that the structure is incomplete? If not, what do you respond to NumPy who, in accordance with the Array API consortium recommendations, added support for DLPack but requested support for readonly and byteorder flags?

This need for versioning was pointed out in #34 which was opened in March 2019, and has been extensively discussed in other issues over the passing years.

May 12 '22 06:05 mattip

Just to make that clear (it seems there was some misunderstanding). I am definitely not against versioning. However, it would be useful if versioning-related information is appended to data structures so that they are still interpretable by older versions at an ABI level.

Second -- there is a lot of value to having DLPack data structures described as POD/Plain Old Data in a way that is accessible from many different languages without, e.g., having to include this repository or linking to a library. There was some of including C-level functions, for example by @tirthasheshpatel

The idea is to introduce new functions (to_dlpack(DLManagedTensorVersioned *) and from_dlpack(DLManagedTensorVersioned *)) that convert and export the tensor to a DLManagedTensorVersioned *. It's good to have for type safety but otherwise, A1 is simpler.

Maybe this is fine. However, this is a deviation from the current purely POD-style interface where the only kind of function is a function pointer.

Finally, one suggestion that I would also have is that the PR#101 adds some kinds of flags field.

Adding uint8_t readonly; is kind of wasteful (1/8 bits used, and another 3 unused bytes will be added for padding).

Why not add a uint32_t flags value? This could represent readonly status, endianness, and whatever other people will want to add in the future without having to discuss an ABI break every time.

May 12 '22 15:05 wjakob

@mattip agreed A3 is kind of magical and that's less desirable than an explicit indication. It seemed like there was an attempt being made to avoid breaking backwards-compatibility by considering the DLTensor fields as sacred and locating new fields elsewhere.

I think there are a few possibilities:

Two C++ libraries integrating without any Python support: If one library is to return a DLManagedTensor, we either have to signal a version by a) modifying memory outside the bounds of the v1 structure; b) modifying an unused field in the v1 structure; or c) introducing an explicit API which carries the drawbacks @wjakob mentioned (i.e. this is no longer just an in-memory format).
Consumer has Python support but is integrating with a C++-only producer library: I don't think __dlpack_info__ helps here because it would be the consumer's __dlpack_info__ wrapping the produced DLManagedTensor, right?
Both consumer and producer have Python support: then I think __dlpack_info__ is a reasonable out-of-band signaling mechanism.

I should also state that my interest here is purely from POV of having TVM interoperate with other libraries. These possibilities capture the range of situations I could imagine a standard such as this needing to confront.

May 12 '22 15:05 areusch

@wjakob

it would be useful if versioning-related information is appended to data structures so that they are still interpretable by older versions at an ABI level.

Yes, this is exactly what #105 will do (I will update the PR to follow A2).

Second -- there is a lot of value to having DLPack data structures described as POD/Plain Old Data in a way that is accessible from many different languages without, e.g., having to include this repository or linking to a library. There was some of including C-level functions, for example by @tirthasheshpatel

I meant the C/C++ libraries that want to add support for the new ABI will have to create those new functions. DLPack itself won't have any functions and will still remain a POD. So, no linking will be necessary.

Why not add a uint32_t flags value? This could represent readonly status, endianness, and whatever other people will want to add in the future without having to discuss an ABI break every time.

The plan is to batch all the requests like readonly, endianness, alignment, etc in one single ABI break. Althouh, it would be useful to also add some padding for future use.

May 13 '22 06:05 tirthasheshpatel