urlpattern icon indicating copy to clipboard operation
urlpattern copied to clipboard

URLPattern use of URL record is ambiguous

Open youennf opened this issue 1 year ago • 10 comments

What is the issue with the URL Pattern Standard?

URLPattern is using URL records, and passing some URL records to URL algorithms. This might lead to ambiguous behaviour.

For instance https://urlpattern.spec.whatwg.org/#canonicalize-a-hostname is passing the URL record to parse hostname (via https://url.spec.whatwg.org/#hostname-state). It is unclear whether an URL record is a special URL or not, so it is unclear whether parsing of the hostname (https://url.spec.whatwg.org/#concept-host-parser) will lead to https://url.spec.whatwg.org/#concept-opaque-host-parser.

Looking at WPT tests and Chrome results, the expected behaviour is to consider that the url is special and hostname is not considered opaque.

The same pattern is done in other places of the spec (canonicalisation algorithms of port, pathname, search and hash for instance), though it is not clear to me whether this has side effects.

Discussing with @annevk, it is unexpected to pass URL records directly like done in https://urlpattern.spec.whatwg.org/#canonicalize-a-hostname. A future url spec refactoring might make that clear.

Ideally, the url given to the basic URL Parser should be an URL built by a URL parser. Maybe that is what the spec should do.

youennf avatar Jan 15 '25 13:01 youennf

Thank you for filing this. I agree, the current spec doesn't specify the given URL is special or not and this should be explicit. Or even the current spec may be wrong because the default value of scheme is empty string, which is not a special scheme so the URL wouldn't be special.

Looking at WPT tests and Chrome results, the expected behaviour is to consider that the url is special and hostname is not considered opaque.

Yes.

Ideally, the url given to the basic URL Parser should be an URL built by a URL parser. Maybe that is what the spec should do.

Are you referring URL parser? I see there is a comment in URL parser, saying "Non-web-browser implementations only need to implement the basic URL parser". I assume some non-web-browser implementers don't implement this algorithm. Do you have any concern about just using the result of basic URL parser, given "http://dummy.test" as an input and pass it as dummyURL in https://urlpattern.spec.whatwg.org/#canonicalize-a-hostname?

sisidovski avatar Mar 12 '25 07:03 sisidovski

I think that generally works, but note that you cannot distinguish between someone passing in a hostname of : and a hostname of dummy.test. So how exactly do you know whether it succeeded?

annevk avatar Mar 12 '25 14:03 annevk

I suppose we could maybe standardize upon a magic host value, such as urlpattern.invalid, but it doesn't seem great.

annevk avatar Mar 12 '25 14:03 annevk

Perhaps we can make https://url.spec.whatwg.org/#hostname-state return failure in more cases to make it more suitable for this caller. As all other callers with hostname state as override state ignore the return value. Which does make this API stand out, but so be it. I'll investigate.

annevk avatar Mar 13 '25 00:03 annevk

@sisidovski I think https://github.com/whatwg/url/pull/863 coupled with your suggested change here to create a proper dummy URL this should be good. When implementing this I found one impacted test for which I put up a patch at https://github.com/web-platform-tests/wpt/pull/51316.

annevk avatar Mar 13 '25 12:03 annevk

Thank you so much. I'll check your PRs and then work on the URLPattern side.

sisidovski avatar Mar 14 '25 04:03 sisidovski

I looked at https://github.com/whatwg/urlpattern/pull/255 and https://github.com/whatwg/urlpattern/issues/220.

Basically we'll have a dummy URL with a special scheme, but what if a non-special scheme is specified? e.g. {protocol: 'git', hostname: 'café.com'}? In that case the encoded string should be caf%C3%A9.com, rather than xn--caf-dma.com? If so, I feel https://github.com/whatwg/urlpattern/pull/255 is a reasonable approach.

sisidovski avatar Mar 24 '25 16:03 sisidovski

#255 still has the problem of dummyURL not being fully defined. It might also have the wrong default (not defaulting to a special URL) when protocol is not given or cannot be given. Overall though that seems like a reasonable direction.

annevk avatar Mar 24 '25 16:03 annevk

I agree my PR feels a bit hackish, however the canonicalize a domain name method (with https as default scheme) is only called for special URLs. All non special URLs (which including URLs without explicit protocol) are still processed through the canonicalize a hostname method. I don't know if its the right behaviour, but I think its consistent the specs.

rubycon avatar Mar 25 '25 10:03 rubycon

I think what we want is that URLs without explicit protocol also use the special URL code path. At least I think that's what the current tests end up requiring and what is implemented.

annevk avatar Mar 25 '25 12:03 annevk