url Support relative URLs

The new URL() constructor currently requires at least one of its arguments to be an absolute URL.

new URL('./page.html', 'https://site.com/help/');  // OK
new URL('./page.html', '/help/');  // Uncaught TypeError: URL constructor: /public_html/ is not a valid URL.

That requirement is painful because determining which absolute URL to use as a base can be difficult or impossible in many circumstances. In a regular browser context, document.baseURI should be used. In Web Workers, self.location should be used. In Deno, window.location should be used but only if the --location command line option was used. In Node, there is no absolute URL to use. Trying to write isomorphic code that satisfies this requirement is quite error prone.

Additionally, in many cases it would be useful to parse and resolve relative URLs against each other without knowing an absolute base URL ahead of time.

// Desired output - these currently do not work
new URL('/to', '/from').toString();  // '/to'
new URL('/to', '//from.com/').toString();  // '//from.com/to'

The lack of support for use cases involving only relative URLs is causing me to remove WHATWG URL from Ky, a popular HTTP request library, in favor of our own string replacement. See: https://github.com/sindresorhus/ky/pull/271

Desired API and whether to update the existing `new URL()` API or create a new API?

From my perspective, updating the new URL() constructor so it can handle a relative URL in the baseUrl argument would be ideal, i.e. remove the requirement for an absolute base in favor of simply parsing any missing URL parts as empty strings (as is currently done when a URL lacks a query, for example). But I understand that changing new URL() at this point may be difficult and it may be more practical to instead create a new API; perhaps new PartialURL() or split out the validation, parsing, and resolution algorithms into individual methods.

For my purposes, I need to at least be able to parse and serialize a relative URL, without having to provide an absolute base URL. A method that resolves two relative URLs against each other and returns the resulting relative URL would also be useful, e.g. URL.resolve('./from/index.html', './to') -> ./from/to.

Jul 17 '20 03:07 sholladay

Well, its purpose is to create a URL and those are by definition not relative. I could see wanting something specialized for path/query/fragment manipulation though. Are there any popular libraries that handle that we could draw inspiration from?

Jul 17 '20 07:07 annevk

Where is it defined that a URL must contain a scheme and a host in order to be a valid URL?

Even if such a definition exists, new URL() is the first API in the web ecosystem that I have encountered that has this limitation, making it quite surprising.

Beyond that, the WHATWG URL spec itself defines relative URLs...

https://url.spec.whatwg.org/#relative-url-string

As for existing implementations, see Node's url.parse() and url.resolve(), among others. I've used these extensively to manipulate URLs where the scheme and/or host is not known ahead of time and will be determined later by the end-user or browser, depending on where the URL is ultimately used.

Jul 17 '20 09:07 sholladay

It defines them as input (though only in the context of a base URL, which at least browsers always use), it doesn't define them as data structures. The data structure is defined at https://url.spec.whatwg.org/#url-representation (though it's fair to say that does make it seem like more is optional than in reality is optional; something to improve).

Jul 17 '20 09:07 annevk

I get that browsers need an absolute base URL to actually perform a request. And thus it makes sense for the URL specification to define what an absolute base URL is and discuss resolving relative URLs in the context of an absolute base URL, etc.

What doesn't make sense to me is why new URL() imposes this limitation. I cannot think of anything else on the web platform that does this. Even HTML's <base> tag supports relative URLs, despite the fact that it is specifically meant for defining the base URL.

I can see some value in an API that tests whether a URL is absolute. So perhaps part of the problem here is that new URL() actually does a lot of things: parsing, resolving, and validating. These could be broken down into separate methods. I don't think that is strictly necessary, though it would be one way to solve this.

Jul 17 '20 12:07 sholladay

Browsers only have a single URL parser that works as new URL() does (and as defined at https://url.spec.whatwg.org/#url-parsing). E.g., when parsing <base href> the location of the document is used. And in fact, the entirety of the web platform does this as it all builds upon this standard and its primitives.

Jul 17 '20 15:07 annevk

Browsers only have a single URL parser that works as new URL() does

Sure, as I said, it's completely reasonable that a browser needs to resolve to an absolute URL. But I'm not building a browser and I have a suspicion that most new URL() users aren't, either. I'm building software for the web platform that is environment agnostic and needs the same functionality as new URL() even if the scheme or host is not yet known. Use cases and relevant code linked to above.

Jul 17 '20 16:07 sholladay

To try and clarify this issue: it seems that you're not asking for a definitional change but an actual behavioural change to the Web-facing URL API.

Specifically, the changes you seem to be asking for are:

If the base argument is not supplied, it defaults to document.location (the current page's URL), rather than the current behaviour which requires the url argument to be absolute if base is omitted.
If the base argument is not absolute, it is first resolved against document.location (the current page's URL), rather than the current behaviour which unconditionally requires the base argument to be absolute.

So for example, if you executed these on https://github.com/whatwg/url/issues/531, all of the following are currently errors, and they would change to work as follows:

// Proposed API.
> new URL('to');
"https://github.com/whatwg/url/issues/to"

> new URL('to', '/from/');
"https://github.com/from/to"

> new URL('to', '//from.com/');
"https://from.com/to"

Technically, this is all feasible, but I don't think it's necessary or desirable. It's rather trivial to write code using the current API that behaves like this if you want it to:

// Current API.
> new URL('to', document.location);
"https://github.com/whatwg/url/issues/to"

> new URL('to', new URL('/from/', document.location));
"https://github.com/from/to"

> new URL('to', new URL('//from.com/', document.location));
"https://from.com/to"

I personally prefer not to change this. The current API forces you to be explicit about incorporating the current document's location, so it's clear to anyone reading the code that the current page's URL might leak into the result. When you don't use document.location as a base, it's a pure mathematical function of the inputs, and will produce the same output on any web page. That's a good property which I don't think we should break.

Jul 20 '20 01:07 mgiuca

No. I want to be able to parse and resolve relative URLs in an environment-agnostic way, for example on the server. It's completely unacceptable to rely on the DOM. The point of this issue is new functionality, which would behave exactly like new URL() does now, except it would support relative URLs in both arguments and it would return the resolved and parsed relative URL. That's it. I'm not asking for magical implicit resolution to an absolute URL. Just allow baseUrl to be relative and if it is relative, then return a relative URL.

I don't care if this is a change to the constructor or exposed as some new method.

Jul 20 '20 02:07 sholladay

Ohh, I see what you want now. (Tip: When filing a bug asking for a change to API behaviour, please give sample input and output so it's clear what you want.)

So am I right in thinking that this is what you want for my three examples:

// Proposed API.
> new URL('to');
"to"

> new URL('to', '/from/');
"/from/to"

> new URL('to', '//from.com/');
"//from.com/to"

(Noting that I'm using strings to represent the output above, but it would actually be a URL object.)

OK that makes sense. It does mean changing the URL object to allow representation of all kinds of relative URLs (scheme-relative, host-relative, path-relative, query-relative and fragment-relative). Though maybe that's helpful in explaining in general all of those different kinds of relative, which currently are not captured in the spec other than as details of the parser algorithm.

Jul 20 '20 03:07 mgiuca

To be fair, I referenced Node's url.resolve() as an example of an existing implementation that produces the expected output (approximately). But point taken. Yes, you are correct about the desired output.

This would be a massive help to a lot of libraries and tools, especially those that aim to be isomorphic.

Jul 20 '20 04:07 sholladay

For multipart/related, we invented a scheme "thismessage:". You could use "thismessage::/" as the base if you didn't have one, and remove it when if was there when done. https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml#thismessage

Jul 20 '20 05:07 masinter

Interesting. I did actually consider something exactly like that using invalid: as a scheme, but it's a hack and we'd like to avoid it. In Ky, we were able to use a regex string replacement for the query part of the URL, which also isn't great, but that was sufficient for the one place we still used new URL() - we removed all other usage of new URL() due to the aforementioned problems. There are other situations I've encountered, though, where something more complicated is needed. Parsing and resolving relative URLs is really something that should be built into the standard web APIs.

Jul 20 '20 07:07 sholladay

Hi, I’m in a similar situation. I’m prototyping a bundler and I keep running into issues using the WHATWG URL class, specifically because it does not parse origin-relative URLs. The use-case is that I want to specify a common prefix for the public distribution of static files; for instance, the prefix can be the string "/static/", implying that the origin is the same origin as the server, but it can also be an absolute URL on a different origin ("https://mycdn.com/"). Some common operations I need include resolving relative and absolute URLs against this base, detecting if another URL is “outside” the base, and getting the relative path of a URL relative to the base, all of which could be done if an origin relative URL could be passed to the URL constructor, something like new URL("main.js", "/static/").

If anyone has any solutions, I’d love to hear about it. I’m loathe to abandon the URL class completely because of all the work it does in parsing URLs, but right now I have a Frankenstein system with URLs, the path/posix module, and regexes that I’d like to abstract.

Aug 27 '20 05:08 brainkim

@brainkim for that specific case it seems you could work around this by using a fake origin such as https://fakehost.invalid and removing it later on.

Also, if we did something here it would not be by changing new URL(). The output of that has to be "complete" and useful in a wide variety of contexts that expect a scheme and such.

Aug 27 '20 07:08 annevk

@annevk

I’m currently experimenting with using a custom protocol for the base (currently local:///) and it actually seems to be working out. It seems like it’s important to use 1 or 3 slashes so that the constructor does not interpret the first path part as a host. I still need posix path helpers to deal with pathname, and I have lots of code I’m not sure about like url.pathname.startsWith(publicPrefix.pathname) but this slowly seems to be turning into an acceptable solution.

Are there any thoughts on the fake protocol to use? I’m checking against https://en.wikipedia.org/wiki/List_of_URI_schemes to make sure I’m not stepping on well-known protocols. Maybe there is a very good reason not to use local:///? I’ve also considered internal:///, self:///, and relative:///? I want some name which indicates that the URL should be relative to the origin assigned to the server.

Aug 27 '20 19:08 brainkim

You could use thismessage:/ which was set up exactly for this purpose when defining multipart/related

Aug 27 '20 20:08 masinter

@masinter Looks good. From https://www.w3.org/wiki/UriSchemes/thismessage:

defined for the sole purpose of resolving relative references within a multipart/related structure when no other base URI is specified

The “multipart form” part threw me off earlier but I think this is acceptable.

Aug 27 '20 20:08 brainkim

I hope @alwinb doesn’t mind me advertising their library here (nor anyone else, for that matter), but I recently found it through https://github.com/whatwg/url/issues/405#issuecomment-694786491, and it allow manipulating relative URLs and resolving them against other (relative or absolute) URLs in a way that complies to this specification.

It’s really simple, actually!

let url = new Url("../messages/goodbye.txt")
url = url.set({file: "hello.txt"})
console.log(url.host, [...url.dirs], url.file) // null, ["..", "messages"], "hello.txt"

console.log(new Url("https://example.com/things/index.html").goto(url).force().normalize().href) // "https://example.com/messages/hello.txt"

A couple notes:

.normalize() will collapse . and .. appropriately.
.force() will ensure special URLs have a host. (In this example, it’s unnecessary).
URL objects appear to be immutable. (From what I was able to check.)
When parsing a relative URL, you can specify the parsing mode (“special” vs. “file” vs. “regular”) with an argument to the Url constructor. (It defaults to non‐file special, i.e. similar to http[s] and ws[s].)
You can construct URL object from “parts” instead of from a string. (Relevant to #354.)
By default, .toString() will produce a string that can contain non‐ASCII characters. .toASCII() (or equivalently, .toJSON() or .href) will produce an ASCII string, using percent‐encodings and punycode as appropriate.

Maybe this library can serve as inspiration of some kind for an API for the spec.

Sep 21 '20 22:09 zamfofex

@zamfofex thank you, that is a nice summary!

I think that the most important part is not the API though, but the model of URLs underneath.

The parser that is used in the standard at the moment, simply cannot support relative URLs (without major changes, at least). And after having worked on my library, I can understand why, because it was a really complicated and frustrating process to come up with something compliant that could! I'd forgive people for thinking that it cannot be done at all.

I'll sketch part of my solution, for the discussion here.

The force operation is one key part of the solution. Consider the issue of repeated slashes:

http:foo/bar
http:/foo/bar
http://foo/bar
http:///foo/bar

According to the standard all of these 'parse' (ie. parse-and-resolve) to the same URL. However, when 'parsed against a base URL' they behave differently. So you cannot just use:

special-url := [special-scheme :] [(/|\)* authority] [path-root] [relative-path] [? query] [# hash]

or something like that, as a grammar, because then you'd fail to resolve correctly when a base URL is supplied. (I'm using square brackets for optional rules here). So you need to start off with a classic rule that has two slashes before the authority.

My first parser phase is very simple and parses them as such:

(scheme"http") (dir"foo") (file"bar")
(scheme"http") (path-root"/") (dir"foo") (file"bar")
(scheme"http") (auth-string"foo") (path-root"/") (file"bar")
(scheme"http") (auth-string "") (path-root"/") (dir"foo") (file"bar")

From there,

It detects drive letters, via an operation on this structure, and it parses the authority from the auth-string.
Then, the goto operation, is quite like the 'non-strict merge' of RFC 3986. So this is nice, it is just a classic algorithm, and it is very simple.
Finally, force, solves the problem of the multiple slashes. If the (special) URL does not have an authority, or if its authority is empty, then it 'steals' an authority-string from the first non-empty dir-or-file, and it invokes the authority parser on that.
I like this solution, because it matches the standard, but it also respects the RFC. This is indeed a 'force' that is only applied as an error-recovery strategy.

Oct 01 '20 09:10 alwinb

I did a branch of jsdom/whatwg-url a while ago that uses a modular parsing/resolving algorithm, passes all of the tests (well, except 5/1305 that I was looking to get some help with) and has everything in place to start supporting relative URLs.

I did not post it because the changes are so large, as-is, that it would not be feasible to adopt them in the standard. I was thinking about a way to provide the same benefits incrementally and with less intrusive changes, so that it could be merged into the spec gracefully. However, I have the impression that even if I'd manage to do that, the changes will be resisted for reasons that are not technical but social and emotional. So I am leaving it here as is. I am disappointed by the situation, I hope it will work out eventually, because support for relative URLs would be very useful to people, and also because a modular/ compositional approach enables you to talk with precision about the constituents that URLs are made of, improving the spec itself and all the discussions around it.

There have been good reasons why this has not been done before. It is a messy problem especially in combination with the different browser behaviours. I've built on that work and solved the issue, but as usual, there's more to it than solving the technical challenges.

Part of the discussion around this was in #479.

The branch, as-is... is here: https://github.com/alwinb/whatwg-url/tree/relative-urls. The readme is no longer accurate, Sorry for that.

Oct 21 '20 19:10 alwinb

I think the main reason we have not made a lot of progress here is lack of browser-related use cases. Apart from browsers the API is only supported by Node.js. That's not enough for https://whatwg.org/working-mode#changes. Perhaps that https://github.com/WICG/urlpattern brings some change to this, but it's a bit too early to say. Now I might well be wrong and there is in fact a lot of demand for this inside the browser or by web developers using a library to solve this in browsers today. If someone knows that to be the case it would be great if they could relay that.

Oct 22 '20 07:10 annevk

Our use case is in the browser, I only mentioned other environments as an example of how it could benefit the larger community. Ky targets browsers primarily. We just don't want to specifically rely on the DOM or window. So we try to avoid referencing document.baseURI or window.location. That makes it difficult for us to use new URL() because it doesn't support relative URLs, which we are sometimes given as input because we are operating in a browser and relative URLs are a common occurrence in browser land.

Oct 22 '20 18:10 sholladay

Thanks for your reply Seth, could you perhaps go into some more detail as to why you want to avoid window.location and where these relative URLs are common?

Oct 23 '20 07:10 annevk

you might check with @jyasskin for another use of relative URLs for browsers. Relative URLs were an important part of multipart/related capture of relationship of components in a saved web page. It was the reason for the invention of the "thismessage" scheme (for supplying a base when none was present.)

Oct 24 '20 19:10 masinter

Re @masinter, web packages don't currently have any fields that allow relative URLs. If we change that, I don't think we'd need to expose the relative-ness to Javascript—we'd just resolve them against the package's base URL, like we do for the relative URLs in HTML.

Oct 25 '20 20:10 jyasskin

I'm not completely sure I accurately understand the last comment, but I think that what @jyasskin calls 'exposing relative-ness' is just what this issue is asking for. It is asking for an addition to the API that exposes a parsed version of what is called a "relative reference" in the parlance of RFC 3986 (I usually call it a relative URL).

I'm arguing in favour of it because I would like the standard to define an analogue of "relative reference". This is not currently the case, so in places where relative references are useful or needed, people cannot refer to the standard for guidance.

@annevk points out that for such a change to be considered, they need examples where relative references are useful in a browser context, so we're looking for such use cases.

Oct 25 '20 22:10 alwinb

Thanks for your reply Seth, could you perhaps go into some more detail as to why you want to avoid window.location and where these relative URLs are common?

@annevk points out that for such a change to be considered, they need examples where relative references are useful in a browser context, so we're looking for such use cases.

I think that there are natural cases where generating relative URLs is useful in a web app.

Suppose that some component A generates a link to another component B which takes a query parameter. For example, component A is at http://example.com/inbox and component B is at http://example.com/message?id=<the ID of a message>.

One approach is to generate an absolute URL, so that the DOM will be like <a href="http://example.com/message?id=abcde">Open message</a>. But this introduces unnecessary dependency on the domain name. This causes inconveniences such as that the domain name has to be faked in unit tests.

Another approach is to generate a relative URL, so that the DOM will be like <a href="/message?id=abcde">Open message</a>, and leave the relative-to-absolute conversion to the browser. To do so, it would be useful to write code like

const url = new URL('/message');
url.searchParams.set('id', messageId);
const link = createElement('a');
link.href = url.href;
...

but this code does not currently work because new URL('/message') throws.

May 04 '21 14:05 ti1024

@ti1024:

new URL('/message', location.href);

May 07 '21 20:05 stevenvachon

@stevenvachon That is exactly what I described as “One approach is to generate an absolute URL”, with the drawback I described.

May 08 '21 00:05 ti1024

could you perhaps go into some more detail as to why you want to avoid window.location and where these relative URLs are common?

@annevk Sure. The reason we want to avoid using window.location is because it doesn't exist in Web Workers, among other environments. Web Workers do have self instead of window, though. Node.js doesn't have window or self. There are even environments where a window does exist but without a window.location, such as Deno. Newer environments have globalThis but older environments don't. There are so many special cases, it's a mess and difficult to maintain.

Relative URLs are common mainly in apps that target browsers. It's not uncommon to see something like fetch('/foo.jpg') or fetch('../constants.json'). We aim to make this work, while keeping the implementation of the Ky library as environment agnostic as possible.

Early versions of Ky were designed to pass URLs directly to fetch() without modifying them and without referencing window or document. That worked well because fetch() correctly handles relative URLs as input, and it resolves them against either document.baseURI (e.g. from the <base> HTML element), or window.location, depending on what is available. fetch() works as expected and we want Ky to work that way, too.

Then people requested a new feature where you can pass a searchParams object to Ky, and Ky will add those those params to the input URL before calling fetch(). This is useful, for example, if you are creating a custom API client with ky.extend() and you always want to include a ?limit=100 param to limit the page size to 100 items in the response to every request that is sent with that client. When that feature was implemented, we had to decide how to apply the searchParams to the input URL, and for that we began using new URL() and its property setters, since it's easy to do myUrl.search = mySearchParams. That solution seemed good at the time, but later we realized that it broke relative URL support because new URL() lacks support for relative URLs. I tried to fix the regression by resolving the input URL against the document base, with new URL(input, document.baseURI). But that then caused problems for people using Ky in Web Workers, React Native on mobile devices, and Node.js. I then fixed that by guarding the document reference, although in hindsight that also needs a fallback to window.location, which itself needs to be guarded. You'd think that would be enough, but we had further complaints that our approach of referencing globals was too difficult to mock. The attempted fix for that then broke more stuff...

The point is, writing environment agnostic code that depends on window or document is pretty tricky in practice. And in the end, we were only doing that as a workaround for new URL()'s lack of support for relative URLs. So we dropped new URL() and resorted to regex-based string replacement of the search params instead, for now.

May 08 '21 21:05 sholladay

Support relative URLs

Desired API and whether to update the existing new URL() API or create a new API?

Desired API and whether to update the existing `new URL()` API or create a new API?