Notification for past issue creators and code contributors
You are getting this notification since you have either opened an asdf issue or contributed code to asdf. Changes are being considered to the asdf standard (for V2.0) and an email posting has been made to the two asdf mailing lists summarizing the changes being considered. A separate email announces the relocation of the asdf repository to a new organization (from whence this notification comes).
https://groups.google.com/forum/#!forum/asdf-users https://groups.google.com/forum/#!forum/asdf-developers
Apologies if you are already on the mailing lists. ` . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
`
@CagtayFabry @DavidLutton @JorisDeRidder @LandingEllipse @NumesSanguis @OxygenLiu @adamchainz @alexdevsec @almarklein @aprsa @astrofrog @baerbock @bblais @bernie-simon @bhilbert4 @bobpepin @brechmos-stsci @bsipocz @burke86 @cdeil @certik @cversek @dhomeier @embray @eschnett @eteq @ggggggggg @hyiltiz @isonder @jaytmiller @jdavies-st @jhunkeler @justincely @krischer @lamby @lgarrison @manodeep @migueldvb @mpreuten @mwtoews @olebole @pllim @rossant @sbinet @stscieisenhamer @tiagopereira @vmarkovtsev @wieland400 @xaberus @xnox @yjarosz @yuvallanger @zdroid @zurk
Looks like GitHub limits us to 50 mentions, so here's a shout out to the last 4 folks:
@yjarosz @yuvallanger @zdroid @zurk
I'm a little confused by this proposal.
I thought the point of the tag property in schemas was that the object being validated must have that tag. It's not assigning the tag to the object so much as ensuring that the tag is present in the file being validated.
Not sure why this proposal. I always thought this was useful and can make for cleaner looking documents in many cases. As described in YAML Schema they're only intended as hints anyways, and are not required for implementations to follow.
Is there a better way to ensure YAML serialization consistency between implementations (not that this approach is perfect either given that it's optional)?
I'm a little confused by this proposal.
I thought the point of the tag property in schemas was that the object being validated must have that tag. It's not assigning the tag to the object so much as ensuring that the tag is present in the file being validated.
@embray I was confused about the tag property when I wrote up that item -- all we want to do is use the same URI scheme for YAML tags and schema ids. At the time I thought that tag was just an annotation to describe the YAML tag, but since it's actually a directive to validate the tag, I agree we need to keep it. I'll update that section of the wiki page.
Yeah, from the YAML Schema docs, the tag property is:
A fully-qualified YAML tag name that should be associated with the object type returned by the YAML parser; for example, the object must be an instance of the class registered with the parser to create instances of objects with this tag. Implementation of this validator is optional and depends on details of the YAML parser.
Arguably you could do without this, but then one could write a file that for example omits the tag for, say, an ndarray. Then when reading that file if the tag is missing there would be no way to interpret the data as an ndarray. Arguably this is unnecessary since a correct writer would output the tag as well. But this way the reader can give better debugging help. Like "this was property supposed to have the ndarray tag, but doesn't, so I don't know what to do with this data".
Anyways, as you say, it's just a validation directive. It's not actually responsible for assigning that tag to objects. The tag already needs to be present in the document being validated. If you assumed it was responsible for assigning the tag, then I would agree it seems redundant.
Here's where the tag validation is implemented. It just checks that the deserialized object has the expected tag. Otherwise you could either put an untagged object in that property, or an object with a tag, but the wrong tag (and hence the wrong object).
In the meantime we've also come up with an interesting new use case for tag. Currently when we want to identify a property that is to contain a transform object, we reference the transform-x.y.z schema fragment as a marker:
some_transform:
$ref: transform-x.y.z
but that fragment is just a set of common properties of transform schemas, currently name and inverse. Both of those are optional so some_transform accepts almost any object and our schema doesn't provide any useful validation.
In vanilla JSON schema the only real solution I can think of is something like this:
# transform_marker-x.y.z.yaml
anyOf:
- $ref: gaussian1d-x.y.z
- $ref: gaussian2d-x.y.z
- ...
some_transform:
$ref: transform_marker-x.y.z
But there are 90-something transform schemas, so this giant combiner is going to be horrible to evaluate. With tag though we can define the marker schema like this:
# transform_marker-x.y.z.yaml
anyOf:
- tag: tag:stsci.edu:asdf/transform/gaussian1d-x.y.z
- tag: tag:stsci.edu:asdf/transform/gaussian2d-x.y.z
- ...
That would accomplish the same thing, and would be quick to evaluate.
Wouldn't you still want the serialization of the transform checked against the schema too though? Or does that happen somewhere else?
I've had a use case similar to this recently where I have an anyOf with a long list of sub-schemas, but each sub-schema has a specific property within it that acts as a "key" which takes a unique value for each sub-schema. So to match an instance against the correct schema really only requires comparing that key, and not the entire instance. For example something like:
properties:
key:
type: string
enum: [a, b, c]
oneOf: # I think oneOf makes more sense for this case than anyOf
-
properties:
key: { const: a }
# additional properties for key=a
-
properties:
key: { const: b }
# additional properties for key=b
-
properties:
key: { const: c }
# additional properties for key=c
this scheme basically works (and is similar to what you want to do with tag). The problem is that JSON Schema doesn't provide a way to make the evaluation of oneOf more efficient by always just looking at the value of key to match against the correct schema.
What you really want here is an extended version of oneOf that acts more like a switch statement. Like instead of providing oneOf with a list it could be a mapping. Something like:
oneOf:
switch: key
cases:
a: { "$ref": a-schema }
b: { "$ref": b-schema }
c: { "$ref": c-schema }
This would be much faster than trying to validate against each sub-schema until one that matches is found.
I suppose I could implement something like this in a new meta-schema, but it seems better suited for an upstream contribution to JSON Schema.
I should add, what I described above is essentially made possible in draft-07 using allOf combined with if/then as in the last example here: https://json-schema.org/understanding-json-schema/reference/conditionals.html
but it's so kludgy and verbose. I like my version better :upside_down_face:
Final note: As long as we're talking about extended meta schemas, your idea of using tag to match an object to a schema could certainly be a good idea to add to YAML Schema.
Wouldn't you still want the serialization of the transform checked against the schema too though? Or does that happen somewhere else?
We definitely want that, but since the transform has its own tag in the YAML, it gets validated independently. What we've been doing in this case is validating the object twice: once on account of its own tag, and once due to the $ref in its containing object.
What we've been doing in this case is validating the object twice: once on account of its own tag, and once due to the $ref in its containing object.
I think I understand... I was kind of wondering if that might be the case but I wasn't sure.
Thank you for bringing this to my attention Is it still possible to propose/discuss changes and additions for the 2.0 release of the asdf standard? @perrygreenfield @eslavich Should those be posted here, on separate GitHub issues or via the mailing list?
@CagtayFabry absolutely, that's why we brought it up on the mailing lists and issues.
Should those be posted here, on separate GitHub issues or via the mailing list?
@CagtayFabry separate GitHub issues would be ideal. We'd be thrilled to have your input!
Edit: Discussion of changes to the standard is taking place on the asdf-standard repo. Discussion of the library will occur here, but we haven't really gotten started with that yet.