[DISCUSS][RFC] DLPack Versioning and ABI Update
This is a thread to bring awareness and discuss the recent set of proposed change of adding versioning information to DLPack.
Summary
Up until now the DLPack C struct itself does not carry ABI version information, and we always maintain backward compatibility. Up until now there is no ABI breaking changes.
The overall stability is one reason why frameworks adopt DLPack in the first place and we would like to continue to do moving forward. In the meantime, it is indeed helpful to clarify the DLPack ABI version in the data structure itself. contains a draft proposal for the change. We can still
In short, we are looking into the possibility of attaching version information to the DLTensor struct, to minimize the ABI change, the version information is appended in the end of the current DLTensor
Summary of key changes
- C struct: append version struct in the end of current DLTensor, we can still discuss the specific impl of version struct. There is also a possibility of readonly flag/mask in light of the change
- Python API: introduce
__dlpack_info__function that returns the current supported API/ABI version.
Compatibility implications
One thing that is worth clarifying is that whether this constitute a ABI breaking change. It depends on how people use it. Let us clarify the following scenarios.
- S0: Frameworks that consume the old DLPack ABI should be able to continue consume the new DLTensor and DLManagedTensor by ignoring the fields
- Specifically, the memory layout of the existing fields remains the same, so old access code should work out of the box
- The deleter signature remains the same, which means call into the deleter of DLManagerTensor(where deleter is defined by producer that produces the new DLTensor) should safely delete the new structure.
- S1: Framework that produces old DLPack ABI Tensor can be consumed by the new framework, if the new framework queries the existence of
__dlpack_info__then do the conversion according- Note the catch is that if a new framework was handled an arbitrary (old) DLTensor, querying the ABI field will result in an undefined behavior. So the guard on the python side exporter is needed.
- S2: Interestingly, the old framework can include thew new DLTensor/DLManagedTensor struct, not updating any of its exchange implementations. From consumer pov, they can receive and use DLTensor as if it is an old struct.
- The deleter will ensure proper deletion.
- It is fine as a result for framework to "soft update" to include the new header, but not updating
__dlpack_info__to indicate that it is already at a new version
Normally S0 means that the data structure ABI is still backward compatible (new data structure being used in old scenarios). S1 somewhat seats in the future-compatible regime(old data structure being used in new scenarios).
Related threads
- previous discussion https://github.com/dmlc/dlpack/issues/34
- https://github.com/dmlc/dlpack/pull/101 PR draft
What to Expect
This is a notice and discussion thread to let the community that this is happening. It would be great to get folks to also tag related framework communities, so the ecosystem is fully aware of(and ideally buy-in) the proposed change before we formally proceed.
Given this change does have compatibility implications (although not breaking depending on how we see it), we will have a longer period of NOTICE(expect 1-2 months) before we proceed. Please tag as many folks you think are related as possible.
In the future, we also suggest to open thread of this kind to discuss the compatibility implication of the changes, if any.
cc @rgommers @wookey @tirthasheshpatel @junrushao1994 @jermainewang @leofang @kkraus14 @szha @hawkinsp
what should downstream frameworks expect in the way of migration functions? should folks proposing an ABI-breaking change here also be on the hook to provide an implementation that produces a new DLTensor struct given an old one? it could be helpful to provide such implementation to clarify how to interpret such older structs.
Thanks @areusch this is indeed something that worth covering. The compatibility implication section covers some of that
- The frameworks are "safe" to simply use the new struct without changing the API due to S0(backward compatibility)
- There should be new API update to introduce
__dlpack__info__at which point the new fields needs to be filled properly.
It is a bit ambiguous to call it ABI-breaking(due to S0) but would be good to list out all the implications
one concern with placing the version information at the end of the struct is that, in a V1 DLTensor, that field is invalid. therefore, the memory in that location could be set to any value. that begs the question whether we should require some magic, and how that magic should perhaps be computed.
i think adding the migration functions would help folks to consider these corner cases, which are not always obvious when writing these specs.
I believe what you said is covered by S1 part, which I agree that some of the migration function(likely filling in the value) would be useful
i'm not sure i follow--how can a C impl trust this memory, since it's outside the boundaries of a DLTensor in v1 of this spec? even if __dlpack__info__ exists on the Python wrapper class, how could the impl know whether or not the DLTensor instance was produced by a V2+ framework (and therefore, trust the memory which holds the version info)?
The protocol is covered in the PR to implement this, as documented here (summarized below). As I understand the proposal, if the producer has no __dlpack_info__, then by definition it is using V1: this is S0 above. If the consumer calls __dlpack__() with no version, then the producer should return a V1 C structure for backward compatibility, at least for the next few years: this is S1 above. Otherwise, as described in the protocol, the consumer requests a specific dlpack version, which then dictates the C structure returned by the producer.
PR protocol documentation
The array API will offer the following syntax for data interchange:
A
from_dlpack(x)function, which accepts (array) objects with a__dlpack__method and uses that method to construct a new array containing the data fromx.__dlpack__(self, stream: int = None, version: int = None),__dlpack_info__(self), and__dlpack_device__(self)methods on the array object, which will be called from withinfrom_dlpack, to access the data, to get the maximum supported DLPack version, and to query what device the array is on (may be needed to pass in the correct stream, e.g. in the case of multiple GPUs).
Semantics
DLPack describes the memory layout of strided, n-dimensional arrays. When a user calls
y = from_dlpack(x), the library implementingx(the "producer") will provide access to the data fromxto the library containingfrom_dlpack(the "consumer"). ...
A DLPack version can be requested by passing the
versionkeyword. The consumer should call the__dlpack_info__method to get the maximum DLPack version supported by the producer and request for a version both support e.g.min(producer_version, consumer_version). If the consumer doesn't support any version below the producer's maximum version, a BufferError should be raised. Similarly, If the producer doesn't support the requested version, it should raise a BufferError.
@mattip thanks for the pointer. those protocol docs make sense when Python is in the loop.
I'm thinking of the following scenario--it could be something that doesn't apply to dlpack, so please correct me if I'm wrong:
- A producer (C++) is compiled against dlpack created prior to this proposal landing
- A consumer (C++) is compiled against dlpack created after this proposal lands
- They are communicating by pointer which erases the type (e.g. PackedFunc in TVM-land passes
DLTensor*asvoid*)
Now suppose the consumer is trying to work with the DLTensor created by the producer. Presumably this involves a flow in which a DLTensor* is cast into DLTensorVersioned* (e.g. via DLTensor* (from an old library not compiled with an up-to-date dlpack) -> void* (suppose this is done in order to pass via a PackedFunc in TVM-land) -> DLTensorVersioned*).
When the consumer (or the consumer's wrapping Python DLTensor class) tries to access the received DLTensorVersioned*, it would try to make a version check against version.dlpack, but the producer allocated a smaller struct (sizeof(DLTensor)), that memory is invalid.version.dlpack could be random.
A way around this could be to require that all DLTensor passed via C++ be allocated by the consumer, and the consumer could then zero-initialize the DLTensorVersioned and fill in the requested version. However, then that drags in some complexity around requiring that sizeof(DLTensorVersioned) only grows as version increases, so there are some drawbacks there too.
Independent of the discussion here, my suggestion would be to let this RFC gather feedback for 1-2 months before making changes of the code. An ABI break for such an important data exchange protocol is serious. Having enough feedback will ensure that there are no regrets.
@wjakob thanks for the suggestion! That is indeed the goal of this RFC(to ensure a long enough period of time before we commit as we value stability), I made it clear in the post
They are communicating by pointer which erases the type (e.g. PackedFunc in TVM-land passes DLTensor* as void*)
Can you point to this code? At some point the producer must decide to allocate and fill in the C struct. That is the point where there needs to be a protocol for specifying a dlpack version. If the architecture of the producer disallows any such protocol, how can dlpack ever change?
Edit: I think the term used to describe such situations is "dependency hell".
Some update after taking a closer look. Actually right now the proposed ABI did break S0 as well. The main reason was due to the fact that DLManagedTensor's deleter field seats right after a DLTensor. So any append of new fields would effectively change the location of deleter and break S0.
To further clarify the situation, let us think about the following possible changes to DLTensor fields:
- C0: Append new fields to the DLTensor(e.g. readonly)
- C1: Change some of the current field definition (that affects the layout of the current structure).
Due to our favoring of stability ideally C1 should be extremely rare. And C0 ideally should maintain backward compatibility(the new struct can be used safely in old settings -- ignoring the fields).
There are a few choices if we consider the need that "C0 should not break" ABI compatibility
A0: Place version in DLManagedTensor
struct DLManagedTensor {
Version version;
void * manager_ctx;
void (*deleter)(struct DLManagedTensor * self);
DLTensor dl_tensor;
};
Note that this is a ABI breaking change.
- Importing and exporting won't be backward compatible
- New importer needs to handle old exporting cases.
- Note that deleter and manager context are placed before, so their location layout remains stable when field append to DLTensor happens
However, this allows future append to DLTensor fields without problem, so future C0 changes can be done in a backward compatible way.
A1: Place version and new flags in DLManagedTensor
struct DLManagedTensor {
DLTensor dl_tensor;
void * manager_ctx;
void (*deleter)(struct DLManagedTensor * self);
Version version;
new_flags here
};
This is a ABI backward compatible change. It does bring restrictions
- No changes to the current fields of DLTensor itself can be made
- New flags are appended to DLManagedTensor, which also makes sense as properties such as read only can come with the managed exchange data structure, while the DLTensor core remains stable.
- This shifts the burden of complexity to DLManagedTensor, with the assumption that core DLTensor should remain stable.
A2: A variant of A1 while allow upgrading DLTensor fields
struct DLManagedTensor {
DLTensor dl_tensor;
void * manager_ctx;
void (*deleter)(struct DLManagedTensor * self);
more flags
};
struct DLManagedTensorVersioned {
Version version;
DLManagedTensor managed_tensor;
}
- In new version, allocate DLManagedTensorVersioned but still return the pointer to the managed_tensor.
- During import, recover DLManagedTensorVersioned by subtracting the ptr offset to read the version information
- Version info can be used to decide whether an ABI update is needed
Summary Discussions
Due to our favoring of stability. Ideally we would strongly favor S0 property (so frameworks do not need to bring extra support to keep up with the protocol), and the fact that DLPack rarely changes is an important property of the exchange protocol, of course, we do want to think about mechanisms to allow us to deal with breaking in case there is really need for it.
I am +1 for the A1 option. It is a minimalist option without severe backward-compatibility concerns. I have proposed this alternative in #105. @rgommers @mattip @seberg @leofang I'd appreciate your opinion on this!
I would tend towards creating the new name (i.e. A2 or variations of it). Otherwise, appending the fields requires out-of-bound exchange of the information whether the field exists. We can do that in Python but it seems stranger in C? EDIT: OK, I guess it is not strange in C. I think my main feeling is that once the old version is not used anymore A2 seems simpler.
If a new name is created, new C code can indicate that by using DLManagedTensorVersioned in headers to give some type/ABI safety.
How exactly the DLManagedTensorVersioned that should look like, I am not sure. Just a stub with the version or any eternally stable ABI fields (which is than cast to the specific versioned one). In the above DLManagedTensor is included, which also works. The downside is that at some point the "included" version may change, so I am wondering if it isn't clearer to force the user to cast. (The layout is still the same in both cases of course, the question is how to expose it in the headers.)
If a new name is created, new C code can indicate that by using
DLManagedTensorVersionedin headers to give some type/ABI safety.
How would one do that? I think the idea of A2 is to still export the managed_tensor field of DLManagedTensorVersioned struct and use offset of the version field to retrieve the version info.
I don't know how current API looks like. I imagine exposing something like from_dlpack(DLPackManagedTensor *tensor) and DLPackManagedTensor *managed_tensor = to_dlpack(object).
And the only way to transition that, is probably to create a new pair of functions that use DLManagedTensorVersioned *? But if that is how the transition happens anyway, then I don't see much point in placing the version last.
So yes, a current from_dlpack could remain compatible with newer versions for now. But for how long? If there is any ABI change in the future it will remain being compatible. so I am not sure that would actually give a very clear transition for when an ABI change happens. (It would assume that eventually no old users exist and then ABI break is possible?)
So, I feel that advantage of "ABI compatibility" with current from_dlpack is unclear. So I would preference doing what seems cleanest in the long term. And for me, that is placing the version first and renaming the struct.
And the only way to transition that, is probably to create a new pair of functions that use
DLManagedTensorVersioned *?
As far as C/C++ is concerned, I too think this is the only way to transition. Both A1 and A2 sound like good options then. I also see now how A2 can provide type safety. @tqchen The A2 option says "In new version, allocate DLManagedTensorVersioned but still return the pointer to the managed_tensor." I don't think we should do that since it takes the type safety advantage away (making A1 a simpler solution). Otherwise, I am happy to go with A2 too.
What do you think @mattip?
Let's wait for some others to chime in and then I can update the PR.
I like A2 as well.
One of the mentioned disadvantages of A1 is "No changes to the current fields of DLTensor itself can be made". How would this look for A2? (In general, it seems to me that adding any fields to DLTensor will be an ABI break.)
How would this look for A2?
We can add new fields onto DLTensor since the version is the first field in DLManagedTensorVersioned (as opposed to A1 where the version field comes after DLTensor).
This would work for software that knows how to interpret a DLManagedTensorVersioned with that specific version, but it would still break older software that depends on a compatible layout, no? How is that not an ABI incompatibility?
The idea is to introduce new functions (to_dlpack(DLManagedTensorVersioned *) and from_dlpack(DLManagedTensorVersioned *)) that convert and export the tensor to a DLManagedTensorVersioned *. It's good to have for type safety but otherwise, A1 is simpler.
The killer feature of DLPack is its simplicity: it's just a header file with a few data structures that are trivially ported to various languages. Frameworks that can exchange DLPack tensors don't need to link to a shared or static library to do so. They don't need even deal with C or C++. Introducing required versioning-related functions as part of this repository sounds like it has the potential to significantly complicate this success story. My 2cts...
(Edit: I am against versioning per se. I am just wondering what this means for ABI interoperability between software that may and may not implement this versioned extension.)
Without versioning, DLPack can't evolve. There are already requests for read-only data support, non-native endianness, etc. Adding those without a version to separate the new structure will lead to spurious breaks everywhere (since not everyone will update at the same time and newer functions won't work with older ones). I am not sure if there is a simpler way; after a lot of discussions, these options seem to be the simplest of all. In long term, we can remove the old structs and also update the spec to not care about the old ABI.
Introducing required versioning-related functions as part of this repository sounds like it has the potential to significantly complicate this success story.
The proposal is to add version-info in the struct to avoid this exactly. All the complications happen in the spec and not the DLPack header file...
Hi everyone,
The issue has been up for almost a month. So far, I think @mattip, @seberg, and @leofang are in favor of versioning and @wjakob is against it. Let's wait until 16'th May for more feedback and, if there is none, it would be good to merge #105 and wait until 23'rd May before closing this to allow some time to revert the changes if needed. Please feel free to add your thoughts on how DLPack should proceed with versioning and ABI break. Thanks!
this is a tricky situation which I suspect comes down to how everyone actually uses DLPack in practice. it'd be helpful to know more about everyone's usage of DLPack so we could evaluate whether we're all debating the same use case or if there are others.
my initial impressions of the proposals:
A0.
- good: version is the first field
- bad: because there is no way to signal a version incompatibility between v1/v2. A1.
- good: is an organic evolution into a versioned world, avoids immediate backwards-compat problems
- bad: splits fields across DLTensor and DLManagedTensor, making DLTensor append-only and making it difficult to change the semantics of DLTensor fields. it's a half-solution in my book. A2.
- good: avoids API break, Version is the first field in a wrapper class to DLTensor to make interpretation easier once a pointer to
DLManagedTensorVersionedis found. - bad: finding the pointer to DLManagedTensorVersioned requires assuming that it exists; would the API tell you this? DLManagedTensor cannot be used when the version check fails (e.g. you cannot deallocate it).
might I suggest A3, a variant of A2:
static constexpr const uint32_t kVersionedDLTensorMagic = 0xD1D1D1D1;
struct DLManagedTensor {
DLTensor dl_tensor;
void * manager_ctx;
void (*deleter)(struct DLManagedTensor * self);
};
// note: supersedes DLManagedTensor
struct DLManagedTensorVersioned {
Version version;
void * manager_ctx;
void (*deleter)(struct DLManagedTensorVersioned* self);
uint32_t magic; // set to kVersionedDLTensorMagic ^ (((uintptr_t) &self->dl_tensor) - ((uintptr_t) self))
DLTensor dl_tensor;
}
here is what i think example usage might look like:
- allocate DLManagedTensorVersioned (or invoke other library which may return either DLManagedTensor* or (DLManagedTensorVersioned*)->dl_tensor. Note that this amounts in both cases to returning a pointer to a DLTensor instance.
- The client library's unknowns at this point are:
- The field layout of DLTensor
- The location of
manager_ctxanddeleter - The version of the tensor
- Perform the following algorithm to recover the version:
- Suppose the returned pointer is
void* caller_tensor. - Compute
ptrdiff_t version_ptrdiff = *((int32_t*) caller_tensor )- 1) ^ kVersionedDLTensorMagic, which may be the difference between the address of DLManagedTensorVersioned.dl_tensor and DLManagedTensorVersioned. We could also enforce here that version_ptrdiff is e.g.< 0x100. - Compute
DLManagedTensorVersioned* candidate_versioned = ((uintptr_t) caller_tensor) - version_ptrdiff; - Check if
candidate_versioned->versionis supported and validate version_ptrdiff based on knowledge of that version (e.g. v2 expectsversion_ptrdiff == 16). - If version does not match, compute
DLManagedTensor dl_managed_tensor = caller_tensorand treat as v1.
- Suppose the returned pointer is
- Use per semantics defined in the version. To deallocate, invoke
deleter(self), where those fields are looked-up as is apropos givenversion.
the advantage to this is that the managed fields are moved next to the version, so that they could be used to terminate the lifecycle of a tensor whose version is incompatible with the client library. this scheme is of course vulnerable to various memory tricks as well--you could consider patching Version to include some xor'd magic too. it is more complex, though. thoughts?
it'd be helpful to know more about everyone's usage of DLPack
That is the hard part here, and why @tirthasheshpatel submitted #106. Other users of DLPack can add additional use cases. Without concrete use cases, it is difficult to relate to "what if" scenarios.
To me the A3 proposal seems too "magical". The use of __dlpack_info__ in A2 allows the provider to explicitly state what version of the API they will support, answering "would the API tell you this?" with "yes".
@wjakob when you say
Introducing required versioning-related functions as part of this repository sounds like it has the potential to significantly complicate this success story
do you agree that the structure is incomplete? If not, what do you respond to NumPy who, in accordance with the Array API consortium recommendations, added support for DLPack but requested support for readonly and byteorder flags?
This need for versioning was pointed out in #34 which was opened in March 2019, and has been extensively discussed in other issues over the passing years.
Just to make that clear (it seems there was some misunderstanding). I am definitely not against versioning. However, it would be useful if versioning-related information is appended to data structures so that they are still interpretable by older versions at an ABI level.
Second -- there is a lot of value to having DLPack data structures described as POD/Plain Old Data in a way that is accessible from many different languages without, e.g., having to include this repository or linking to a library. There was some of including C-level functions, for example by @tirthasheshpatel
The idea is to introduce new functions (
to_dlpack(DLManagedTensorVersioned *) and from_dlpack(DLManagedTensorVersioned *)) that convert and export the tensor to aDLManagedTensorVersioned *. It's good to have for type safety but otherwise, A1 is simpler.
Maybe this is fine. However, this is a deviation from the current purely POD-style interface where the only kind of function is a function pointer.
Finally, one suggestion that I would also have is that the PR#101 adds some kinds of flags field.
Adding uint8_t readonly; is kind of wasteful (1/8 bits used, and another 3 unused bytes will be added for padding).
Why not add a uint32_t flags value? This could represent readonly status, endianness, and whatever other people will want to add in the future without having to discuss an ABI break every time.
@mattip agreed A3 is kind of magical and that's less desirable than an explicit indication. It seemed like there was an attempt being made to avoid breaking backwards-compatibility by considering the DLTensor fields as sacred and locating new fields elsewhere.
I think there are a few possibilities:
- Two C++ libraries integrating without any Python support: If one library is to return a DLManagedTensor, we either have to signal a version by a) modifying memory outside the bounds of the v1 structure; b) modifying an unused field in the v1 structure; or c) introducing an explicit API which carries the drawbacks @wjakob mentioned (i.e. this is no longer just an in-memory format).
- Consumer has Python support but is integrating with a C++-only producer library: I don't think
__dlpack_info__helps here because it would be the consumer's__dlpack_info__wrapping the produced DLManagedTensor, right? - Both consumer and producer have Python support: then I think
__dlpack_info__is a reasonable out-of-band signaling mechanism.
I should also state that my interest here is purely from POV of having TVM interoperate with other libraries. These possibilities capture the range of situations I could imagine a standard such as this needing to confront.
@wjakob
it would be useful if versioning-related information is appended to data structures so that they are still interpretable by older versions at an ABI level.
Yes, this is exactly what #105 will do (I will update the PR to follow A2).
Second -- there is a lot of value to having DLPack data structures described as POD/Plain Old Data in a way that is accessible from many different languages without, e.g., having to include this repository or linking to a library. There was some of including C-level functions, for example by @tirthasheshpatel
I meant the C/C++ libraries that want to add support for the new ABI will have to create those new functions. DLPack itself won't have any functions and will still remain a POD. So, no linking will be necessary.
Why not add a
uint32_t flagsvalue? This could represent readonly status, endianness, and whatever other people will want to add in the future without having to discuss an ABI break every time.
The plan is to batch all the requests like readonly, endianness, alignment, etc in one single ABI break. Althouh, it would be useful to also add some padding for future use.