Explicitly state leading `0` in decimal numbers is illegal
AFAICS, the spec does not say explicitly that starting decimal numbers with a leading 0 is illegal. However, the reference implementation appears to reject them. I think it would be helpful to state this explicitly.
Thank you for the suggestion. I can see how adding that fact may help clarify what is and isn't allowed regarding numbers.
Interestingly, none of the ECMAScript specifications explicitly state this either. You're just supposed to infer this from the syntax productions.
Note that leading zeroes are allowed in non-strict contexts in ES, however they represent octal numbers rather than decimals. Since JSON5 does not support octal numbers, leading zeroes are not allowed.
Interestingly, none of the ECMAScript specifications explicitly state this either.
FWIW, I am implementing a JSON5 parser and I found this "delegation", often implicit, to ECMAScript frustrating. You never quite know whether what the JSON5 spec says is the complete picture or just "non-normative" explanation and the actual specification is some undefined subset of ECMAScript. The Numbers chapter is a good example of this.
I'm sorry to hear you're finding it difficult to implement a parser based on the spec, and I appreciate the feedback. Let me try to provide some information that may be helpful.
Everything in the spec is normative with the exception of the examples, which are informational only, and some recommendations for interoperability. The JSON5 spec points to the ECMAScript 5.1 spec for certain grammar productions.
If you implement a parser using only the grammar in the JSON5 spec (and the ES 5.1 grammar productions referenced) then you will be 95% of the way to having a working JSON5 parser. The rest is converting the tokens to actual objects, arrays, and values, and implementing an API.
In the case of Numbers, a JSON5 parser that follows the grammar in the spec would automatically reject multi-digit decimal numbers that start with zero. For example, the grammar productions for numbers follow this path:
JSON5Number -> JSON5NumericLiteral -> NumericLiteral -> DecimalLiteral -> DecimalIntegerLiteral
DecimalIntegerLiteral can only be a single 0 terminal or a NonZeroDigit optionally followed by DecimalDigits.
So, if you had the following JSON5 document:
{abc:01}
First, the parser would use the lexical grammar to parse the first three tokens: {, abc, and :. Then it would parse a 0 token, because according to the grammar, the 0 would be the beginning and end of the token because the only characters that could follow the 0 would be ., e, E, x or X. Since the next character is a 1, it is not part of the 0 token, but it is treated as its own 1 token. Next, it would parse the } token.
Finally, it would use the syntactic grammar to parse each of those tokens up to the 0 token, and it would then throw an error when it reaches the 1 token since only a , or } token would be allowed after the 0 token.
So, there isn't a need to explicitly prevent parsing multi-digit decimal numbers from starting with 0. The grammar just forbids it. In fact, the reference implementation doesn't check for that scenario either. If it finds a 0 that starts a number token, then it checks for ., e, E, x, or X. If none of those characters are found, it parses a 0 token. If any non-zero digits follow the 0 token, then they are treated as a separate number token. Since there are no situations where two JSON5Number tokens can be next to each other in the syntactic grammar, the parser throws an error.
Hopefully this information helps. Let me know if you run into any other issues with the spec or your parser implementation.
Thanks for the detailed explanation.
Everything in the spec is normative with the exception of the examples, which are informational only, and some recommendations for interoperability.
I am not trying to be dismissive (if anything, I would like the spec to be improved and JSON5 to become more widely used), but something like this (first sentence from the Numbers chapter):
The representation of numbers is similar to that used in most programming languages.
Does not give the impression of normative prose. Which makes it hard to judge whether the following sentences describe the complete picture or are just for exposition.
Also, it was not obvious to me at all that the words in the productions are actually links and that some of them lead to another document.
In fact, the reference implementation doesn't check for that scenario either. If it finds a
0that starts a number token, then it checks for.,e,E,x, orX. If none of those characters are found, it parses a0token. If any non-zero digits follow the0token, then they are treated as a separate number token. Since there are no situations where two JSON5Number tokens can be next to each other in the syntactic grammar, the parser throws an error.
Yes, one problem with this approach though is that the resulting diagnostics is mighty confusing. For 01 the reference implementation says:
JSON5: invalid character '1' at 1:2
While my parser says:
<stdin>:1:1: error: leading '0' in number