default selection for `Converter.select_tags`
I ran into the following problem today when working with Converters that support multiple tags (or more specifically multiple versions of the same tag)
Say I have the following Converter:
class FooConverter(Converter):
tags = [
"asdf://somewhere.org/tags/foo-1.0.0",
]
types = [FooType]
When adding a new schema version foo-1.1.0 that is also supported by the converter I would extend the list of supported tags (the legacy tag foo-1.0.0 can still be read)
class FooConverter(Converter):
tags = [
"asdf://somewhere.org/tags/foo-1.1.0",
"asdf://somewhere.org/tags/foo-1.0.0",
]
types = [FooType]
I was expecting the Converter to serialize everything using foo-1.1.0 schema when creating a new file because of the default implementation of select_tag with return tags[0]
https://github.com/asdf-format/asdf/blob/b17caf5029fde09a8e7ec2ec5f91c2a9882d51da/asdf/extension/_converter.py#L55-L80
But I think the converter will always try to use the lowest tag number because the tags seem to get sorted here when creating a Converter instance https://github.com/asdf-format/asdf/blob/b17caf5029fde09a8e7ec2ec5f91c2a9882d51da/asdf/extension/_converter.py#L183
Is this behavior intended @eslavich ? (assuming it works as I think it does and the lowest version is selected)
I would have expected a Converter to serialize to the highest available version number (which should be accomplished by either selecting tags[-1] or reversing the default sorting order for tags)
I have implemented
def select_tag(self, obj, tags, ctx):
return sorted(tags)[-1]
on my converters by default to get a consistent behavior.
Just to add:
This also applies when setting tags = ["asdf://somewhere.org/tags/foo-1.*"] which is probably the most practicle way to support multiple versions in this manner
It might be a good idea to use a sorting function that also deals with version numbers correctly for more consistent results (e.g. self._tags.sort(key=packaging.version.parse) )
Hmm, we haven't run into this because we don't have multiple versions of a tag within the same extension (instead, a new tag version is an occasion for a new extension version as well). We can still use the same Converter in that case, but there's no ambiguity on the tag URI once the extension is selected.
I don't think we can sort with packaging.version.parse because the URI has become more freeform with the new API, and may have a different versioning scheme or no version at all. What do you think about removing the sort entirely? That would allow the Converter author to order tags according to their preference, and then the existing default select_tag implementation would select the first.
Hmm, we haven't run into this because we don't have multiple versions of a tag within the same extension (instead, a new tag version is an occasion for a new extension version as well). We can still use the same Converter in that case, but there's no ambiguity on the tag URI once the extension is selected.
Ah I see, I will have to look into how correctly package and version different extension versions. I think my understanding of how to handle different tags and extension versions is a little lacking.
I don't think we can sort with
packaging.version.parsebecause the URI has become more freeform with the new API, and may have a different versioning scheme or no version at all. What do you think about removing the sort entirely? That would allow the Converter author to order tags according to their preference, and then the existing defaultselect_tagimplementation would select the first.
Removing the sort also sounds good. Before looking deeper my first impression was that I could manually order the tags to get the behavior that I wanted. However this could create a slight ambiguity when working with wildcards (are they resolved to a sorted list or not?)
I think what would certainly be helpful is simply to mention in the Converter API that the tag list will be sorted (in whatever way) upon creation. At that point it should also be pretty clear what to expect from tags[0].
It is easy to override into the desired behavior after all as well
After spending a long time trying to debug this I finally figured out that I had run into this bug because I was using wildcards, which were then being sorted and it was picking the first one which is the oldest tag and that's what it was using when it was saving.
This is really unintuitive (and not documented), what's the point in versioning tags if the library ignores the versions?!
OK, turns out it's not wildcards, it's just that they are sorted in the constructor of the ConverterProxy.
@CagtayFabry is this still an issue?
@WilliamJamieson It is not a pressing issue since overwriting the default behavior for select_tag is rather easy
However personally I consider the current implementation counterintuitive (as was mentioned by Cadair)
afaik in the current code base of asdf this implementation detail is not really relevant so far (it wasn't as of Feb '22) and changing the default behavior shouldn't break anything.
Either way, even if the current implementation is kept an update in the documentation on how Converters handle the list of associated tags would be very welcome