atomic-data-docs icon indicating copy to clipboard operation
atomic-data-docs copied to clipboard

On trailing slashes for URLs / subjects

Open joepio opened this issue 3 years ago • 4 comments

Trailing slashes are weird and confusing:

  • Formally, trailing slashes are semantically relevant.
  • Typically, people used trailing slashes to show something is a directory instead of a regular resource. But this is less and less coming.
  • Many URL implementations (including web browsers) automatically add a trailing slash to origins, so http://example.com becomes http://example.com/
  • Many webservers ignore trailing slashes, and return the same resources (try opening this page with and without a trailing slash)
  • Some webservers redirect to the "correct" page, even

I'm currently struggling with how to deal with trailing slashes. I've had some issues with them before, especially with home pages (where the trailing slash was added in browsers)

Atomic-Server and perhaps even the Atomic Data spec need a very clear opinion on this. For now, I think we should go with the following:

  • Trailing slashes should be ignored. If they are present, they can be removed. This includes trailing slashes for origins.
  • The default way to serialize a URL does not include a trailing slash.

joepio avatar Oct 04 '22 08:10 joepio

Many URL implementations (including web browsers) automatically add a trailing slash to origins, so http://example.com becomes http://example.com/

Not accurate. What gets added is initial slash, and the reason is that a http URI is defined as having a non-empty path segment and an URI with path omitted semantically means the same as an URI with default path "/", and the canonical form of an URI is to expand to include that implied default "/" when omitted.

See RFC 2616 §3.2.2:

If the abs_path is not present in the URL, it MUST be given as "/" when used as a Request-URI for a resource

jonassmedegaard avatar Oct 04 '22 12:10 jonassmedegaard

I recommend to only normalize URIs as per the rules that preserves semantics listed here: https://en.wikipedia.org/wiki/URI_normalization#Normalizations_that_preserve_semantics

If in some situations multiple URIs are used for same resource, then it might be useful to advertise which of them is considered canonical - e,g, using the semantic tag designated for that: https://en.wikipedia.org/wiki/Canonical_link_element#Semantic_tag

...but please don't get too creative and mangle URIs more aggressively!

jonassmedegaard avatar Oct 04 '22 12:10 jonassmedegaard

Thanks for the clarifications @jonassmedegaard!

If I'm understanding you correctly, we should infact keep the initial slash in serialization, but for trailing slashes we might consider removing them.

joepio avatar Oct 04 '22 12:10 joepio

Which URIs are you talking about?

For freshly minted URIs, yes feel free to choose ones without trailing slashes (except the "root" URI of a site should include a slash - because a URI without any path is the SAME but non-canonical as an URI with a single slash as path).

But for URIs fed by others, do not change them - beyond normalisation which does not change their semantic meaning.

If you provide some API where a new object can be created and also dictated what should be the URI, either define as part of the API which URIs are acceptable and reject non-conforming requests, or define the URI part of that API only as a proposal and have the acknowledged (possibly same, but possibly mangled - also e.g. to avoid collision) URI be part of a confirmation response. See e.g. how Micropub defines such response using a Location header: https://www.w3.org/TR/micropub/#response

jonassmedegaard avatar Oct 04 '22 12:10 jonassmedegaard