Handling of special character in hostname
What is the issue with the URL Pattern Standard?
According to the Web Platform Test these hostnames should throw a TypeError :
-
bad/hostname -
bad#hostname -
bad%hostname -
bad\:hostname -
bad\nhostname -
bad\rhostname -
bad\thostname
However the validation of hostname rely almost entirely on URL spec's internal basic parser and according to the spec these cases don't throw a TypeError.
After they're passed to the constructor, they go though the initialize steps, are passed to process a URLPatternInit but not validated because they're patterns. Then they're passed to compile a component with the canonicalize a hostname callback and finally to the basic URL parser with an empty URL Record and state override to hostname state.
-
bad\nhostname,bad\rhostname,bad\thostname: The basic URL parser strip all tabs and newline before processing the input2. If input contains any ASCII tab or newline, invalid-URL-unit validation error.
3. Remove all ASCII tab or newline from input.
So these 3 strings will be treated as
badhostnameand no error will be thrown. However a non failinginvalid-URL-unitvalidation error will occur. This behaviour is consistent with the external URL API (e.g.new URL("http://bad\nhostname")is OK). -
bad/hostnameandbad#hostname: The URL parser will stop processing the input after the special character and return onlybadwhich is safely validated.3. Otherwise, if one of the following is true:
- c is the EOF code point, U+002F (/), U+003F (?), or U+0023 (#)
- url is special and c is U+005C (\)
bad?hostnamefails in the pattern parser which expect the?modifier to be the last character. -
bad\:hostname: The:char is escaped in the pattern parser andbad:hostnameis passed to the URL parser. When the parser encounter the:char with ahostname statestate override it returns without processing any hostname.2. Otherwise, if c is U+003A (:) and insideBrackets is false, then:
2. If state override is given and state override is hostname state, then return.
After returning the hostname is
nulland the code later fail on an assertion when running generate a regular expression and name list. This case looks more like an URL spec issue, it is not consistent with the handling of the/,?and#delimiters. -
bad%hostname: The hostname is fully parsed by the URL parser and passed to the host parser as an opaque URL. The%is allow in opaque url but only for percent encoded values, so a non failinginvalid-URL-unitvalidation error occur.