url When should percent decoding occur by default with paths?

Problem

Some issues on this repo are saying that more decoding should happen on path segments by default:

https://github.com/elm/url/issues/16
https://github.com/elm/url/issues/20
https://github.com/elm/url/issues/25
https://github.com/elm/url/issues/41

Issue #20 also says that this decoding should not occur in all cases, referencing this issue that points to implementations in many languages that do not call percentDecode in a naive way.

What should be done?

Current Status

As mentioned in #16, it is currently possible to use Url.Parser.custom to create custom path parsers that have whatever behavior you personally think is best. For example, custom "STRING" Url.percentDecode would do very aggressive percent decoding.

So nobody is blocked on this topic. It is a question of defaults, and any change should be considered a breaking change that triggers a MAJOR version bump.

Goal

Make a table of scenarios to try to find the ideal defaults for a broad range of people. The default options for path parsing and building are:

no percent decoding
percent decoding for specific characters
percent decoding for all characters

We currently do (1) but maybe it'd be good to make a table to show the various options. Right now I think it would be ideal for someone interested in this topic to:

Build a table of interesting paths and see how they all work under different defaults.
Check the defaults of implementations in other languages.

Hopefully this will reveal more information / some sort of consensus. I think being thorough about this is very important, since any change here could break a lot of peoples' code in ways that might be very hard to detect. Please share your efforts here or on the Elm Discourse.

Feb 10 '21 00:02 evancz

Since an Elm application produces a [ href ] links, and upon clicking it, the Browser.application is also expected to Url.Parser.parse router url, I think the important bit is: the value need to survive "round trip"

given a String value that we provide to Url.Builder.absolute
it must produce a usable link where after we click
and parse with Url.Parser.string
we must get back the same String value without further intervention

Current elm/url does not round trip.

Given String	Url.Builder.absolute produces	Url.Parser.string returns String	Round trip
`"hello"`	`"hello"`	`"hello"`	✔️
`"👍"`	`"👍"`	`"%F0%9F%91%8D"`	❌
`"искать"`	`"искать"`	`"%D0%B8%D1%81%D0%BA%D0%B0%D1%82%D1%8C"`	❌

If we use the auto encode and auto decode versions

Given String	Url.Builder.absolute+encode produces	Url.Parser.string+decode returns String	Round trip
`"hello"`	`"hello"`	`"hello"`	✔️
`"👍"`	`"%F0%9F%91%8D"`	`"👍"`	✔️
`"искать"`	`"%D0%B8%D1%81%D0%BA%D0%B0%D1%82%D1%8C"`	`"искать"`	✔️

We can try it out in this tiny app https://elm-url-test.netlify.app (source)

Fill in the values
Choose the mode
- NoPercentEncodeDecode uses Url.Builder.absolute and Url.Parser.string as-is
- PercentEncodeDecodePath uses Url.Builder.absolute + encode and Url.Parser.string + decode
And most importantly, click on the generated link to see how the round trip turns out. Are we able to parse and obtain our original String values back

NOTE: if you gave Set Path a value of !@#$%^&*() and run the following js in browser console,
Array.from(document.getElementsByTagName('a')).forEach(function(a) { console.log(a.href) })
you'll notice that the a[href] wasn't escaped. this is a problem since # is treated literally as url hash
https://elm-url-test.netlify.app/path/!@#$%^&*()?query=%F0%9F%91%8D
if you did the same but with Set mode: [x] PercentEncodeDecodePath and run the same js, you'll notice that the a[href] is properly escaped
https://elm-url-test.netlify.app/path/!%40%23%24%25%5E%26*()?query=%F0%9F%91%8D

since any change here could break a lot of peoples' code in ways that might be very hard to detect

The function names would best be renamed to avoid this problem

Url.Parser.string becomes Url.Parser.rawString and we add Url.Parser.decodedString
Url.Builder.absolute et al will need to accept ~List String~ List PathSegment (counter part to List QueryParameter) but that means there's no generating of un-encoded paths

May 12 '21 15:05 choonkeat

using strings as-is, vuejs mostly round trips properly ~except it got thrown off by #~ (fixed by using named route for !@#$%^&*())

<router-link to="/path/foo?query=foo">Go to foo</router-link>
<router-link to="/path/👍?query=👍">Go to 👍</router-link>
<router-link :to="{ name: 'pathRoute', params: { id: '!@#$%^&*()' }}">Go to !@#$%^&*()</router-link>
<router-link to="/path/{}[]<>;/?query=/{}[]<>;/">Go to {}[]<>;/</router-link>
<router-link to="/path/日本?query=日本">Go to 日本</router-link>
<router-link to="/path/خحجث?query=خحجث">Go to خحجث</router-link>

the generated href attribute values are escaped

https://vue-url-test.netlify.app/path/foo?query=foo
https://vue-url-test.netlify.app/path/%F0%9F%91%8D?query=%F0%9F%91%8D
https://vue-url-test.netlify.app/path/!@%23$%25%5E&*()
https://vue-url-test.netlify.app/path/%7B%7D[]%3C%3E;/?query=%2F%7B%7D%5B%5D%3C%3E%3B%2F
https://vue-url-test.netlify.app/path/%E6%97%A5%E6%9C%AC?query=%E6%97%A5%E6%9C%AC
https://vue-url-test.netlify.app/path/%D8%AE%D8%AD%D8%AC%D8%AB?query=%D8%AE%D8%AD%D8%AC%D8%AB

https://vue-url-test.netlify.app/ (source)

May 13 '21 03:05 choonkeat

using strings as-is, react router mostly round trips properly. # character gave runtime error; solved with manual encodeURI

<li><Link to={{ pathname: "/path/foo", search: "?query=foo" }}>foo</Link></li>
<li><Link to={{ pathname: "/path/👍", search: "?query=👍" }}>👍</Link></li>
<li><Link to={{ pathname: "/path/!@#$%^&*()", search: "?query=!@#$%^&*()" }}>!@#$%^&*()</Link></li>
<li><Link to={{ pathname: "/path/" + encodeURI("!@#$%^&*()"), search: "?query=!@#$%^&*()" }}>!@#$%^&*()</Link> fixed</li>
<li><Link to={{ pathname: "/path/{}[]<>;", search: "?query={}[]<>;" }}>{}[]<>;</Link></li>
<li><Link to={{ pathname: "/path/日本", search: "?query=日本" }}>日本</Link></li>
<li><Link to={{ pathname: "/path/خحجث", search: "?query=خحجث" }}>خحجث</Link></li>

https://codesandbox.io/s/react-router-url-parameters-forked-sno3x?file=/example.js

Array.from(document.getElementsByTagName('a')).forEach(function(a) { console.log(a.href) })

running the above js in codesandbox's own console (bottom right), you'll notice most of the a[href] values are escaped, except for the 2 with !@#$%^&*()

https://sno3x.csb.app/path/foo?query=foo 
https://sno3x.csb.app/path/%F0%9F%91%8D?query=%F0%9F%91%8D
https://sno3x.csb.app/path/!@#$%^&*()?query=!@#$%^&*()
https://sno3x.csb.app/path/!@#$%25%5E&*()?query=!@#$%^&*()
https://sno3x.csb.app/path/%7B%7D[]%3C%3E;?query={}[]%3C%3E;
https://sno3x.csb.app/path/%E6%97%A5%E6%9C%AC?query=%E6%97%A5%E6%9C%AC
https://sno3x.csb.app/path/%D8%AE%D8%AD%D8%AC%D8%AB?query=%D8%AE%D8%AD%D8%AC%D8%AB

May 13 '21 03:05 choonkeat

Not auto-encoding Url.Builder.absolute path segments means we rely on browser (or http client) implementation of Postel's Law to fixup urls for us
Not auto-decoding Url.Parser.string means the values we provide to build links does not come back to us correctly

So autoencoding Url.Builder.absolute and auto decoding Url.Parser.string are both required.

[very sorry this came in as multiple comments. I didn't know how much time I had or needed]

May 13 '21 04:05 choonkeat