Improve JSON/YAML parsing errors
See #1057 and #1086 for context.
json-codecs (in its current latest version, which is not the same version of the library used by language-purescript at the moment) differs from codec-argonaut/argonaut-codec in the following ways:
- it allows custom error messages unlike both libraries
JsonDecodeError - it allows one to easily define the order in which fields are encoded to an object (I think
codec-arognautsupports this but notargonaut-codec) using record syntax - It allows for field renaming both when encoding and decoding
- unlike
argonaut-codecbut similar tocodec-argonaut, it's value-based codecs - it's not bidirectional like
codec-argonaut, but I could add bidirectional codecs
See the test code and its current output
The library isn't the most well-tested it could be (e.g. are the otherwise unidirectional encoder/decoders for a given type actually bidirectional if used together). But would there be interest in this?
I'll first note that we are using codec-argonaut mostly because it's what the Registry uses - that allows us to reuse a lot of codecs. It would be easier to move if the Registry did as well.
Now - I thought about switching JSON library to improve error messages, but I'm not convinced this move would buy us much?
The error messages that are coming out of your library are a bit nicer out of the box, but not an order of magnitude better - I have the feeling we could achieve something similar by writing a custom printer for JsonDecodeError, or pattern matching on some known failures to return nice errors, both of which should be considerably less work than migrating to a new library and adding custom things on top of that.
To sum up: I am not opposed to switching to something better, but I'd like to see us try to improve things with the current library first, and if these attempts look like failures then the new thing should have some clear way to address them.
Makes sense. I think a custom printJsonDecodeError could work. I'm not sure about the 'pattern matching on some known failures' part because that seems like it'll be hacky one way or another. But definitely a much stopgap solution than updating all the codecs to json-codecs.
I think we'll need both (the matching, and the custom printer) to handle this stuff. E.g. for #1057, we get this JsonDecodeError if there's a publish field, but it doesn't contain the right stuff:
Under 'Config':
At object key package:
Under 'PackageConfig':
At object key publish:
Under 'PublishConfig':
At object key license:
No value was found.
That's a Named "Config" (AtKey "package" (Named "PackageConfig" (AtKey "publish" (Named "PublishConfig" (AtKey "license" MissingValue))))).
If we'd match on the Named "Config" we could have a nice Your spago.yaml configuration is not correct: , to which we could append the result of the custom printer the object 'package.publish' does not contain the key 'license'
Then I think #1086 is a little harder - we get a cryptic
Under 'Config':
At object key package:
Under 'PackageConfig':
At object key dependencies:
At array index 2:
Expected value of type 'Object'.
This is not legible for two reasons (as @finnhodgkin noted):
- the codec for PackageName swallows the parsing error - see here
- we parse with an alternative since we can find either a string or an object in there: https://github.com/purescript/spago/blob/d398cba6b4b7a56640f7ec6e2b079e659068a3af/core/src/Config.purs#L266-L268 This means that even if PackageName would return a sensible parsing error we'd still lose it since codecs-argonaut does not accumulate errors.
I think for (1) we could wrap the parse error in a TypeMismatch, but I am not entirely sure how to tackle (2). Maybe there's a way to restructure our parser so that we get better errors?
I wonder if @garyb or @thomashoneyman have any insight on this?
Hmm... to accumulate the errors you'd have to write a manual codec that tries both branches and then checks if they both fail, and if so constructs a new error message. Even after that, with the current structure of the error message type there's not really a good fit for the constructor that message would go into unfortunately. Re-error-typing the codecs is possible, but it would mean essentially reimplementing them all, or having to call a wrapper function around them all to make them compatible.
I do intend to change things a little bit for the purescript-json version of the Codec-based stuff, I was thinking of just having the errors be a path paired with a plain string, so the structure is there if needed but the message itself can be whatever. Migrating to that would probably be easier than json-codecs because I intend to keep everything else the same as the current codecs, so it should just be a case of swapping the package names, and only code that produced or dealt with errors would need changing.
I've been working on my house lately so the JSON stuff has been on the backburner, but if it's something that you're interested in I could set aside some time this weekend to wrap that up instead.
PackageName is a newtype around string.
So can we use our own codec for that?
Something like wrapIso PackageName (CA.string) wouldn't swallow errors, right?
The error swallowing in question is due to the <|> between PackageName.codec and packageSingletonCodec - it'll only be producing the error from packageSingletonCodec.
I've been working on my house lately so the JSON stuff has been on the backburner, but if it's something that you're interested in I could set aside some time this weekend to wrap that up instead.
That would be lovely! ❤️ Wish I could lend you a hand on the house works 😄
It lives: https://github.com/garyb/purescript-codec-json
There are some changes from codec-argonaut:
- it uses
purescript-json(perhaps a bit redundant to point that out smile:) - inside the codec is
ExceptTrather thanEithernow- so we can use the error-accumulating
Alternativeinstance (!) - if there are places that were importing
decodefromData.Codecit can be switched to thedecodefromData.Codec.JSONto avoid the need torunExceptthe result
- so we can use the error-accumulating
- I didn't bring the
CompatorMigrationmodules across (but could do if we end up needing them) - You're not forced to supply a name in codecs for objects/records and indexed arrays now (you can still use
namedto introduce one though). Not forcing names for every level of structure lets the errors combine better, and I've encountered a few cases in my own code where I don't always want to name a thing if it's a record nested in another, things like that.
The revised error structure I mentioned ended up gaining a little more complexity than my "just path + message" proposal, there's also a "causes" that allows errors to be given context (like type names, or when grouping errors for failed alternatives). Here's an example of the kind of thing it prints for an error now where I used <|> to combine 3 codecs:
Could not decode IntTuple:
Failed to decode alternatives:
- Expected value of type Array
- $.fst: No value found
- $.a.value: No value found
- $.a.val: No value found
The setup that produces that is in the tests.
The print function for errors does some work to try and make the output reasonably pleasant to interpret. Each line can be prefixed by a JSONPath style expression that points to the point in the JSON that the error arose at, with the paths being made relative to the outer error they're showing for. If a path is empty / JP.Tip / $ rendering it is skipped. etc. You can see the structure of the error in the tests and compare with the rendering here to get a feel for what it's doing 🙂.
Let me know what you think!
Thanks @garyb, this looks great! ❤️
And it sounds like it would fix all of our current issues. I think the first step would be to port the Registry codecs if @thomashoneyman is on board with this (we'll need Compat.maybe, but we could just put it in the registry-lib for now)
I'm on board. Can do a review of the registry codecs and update their errors after I finish a couple other things, unless this is urgent.
Ah yeah, I was thinking I might add a Compat.maybe style codec to Common under a different name, since I think that's almost the only thing anyone uses from Compat anyway. :wink:
Common.nullable?
👍 exactly what I had in mind!
I've pushed a new version with that added now.
I have this almost ready, with patches for codec-json, the registry, and spago itself