When should percent decoding occur by default with paths?
Problem
Some issues on this repo are saying that more decoding should happen on path segments by default:
- https://github.com/elm/url/issues/16
- https://github.com/elm/url/issues/20
- https://github.com/elm/url/issues/25
- https://github.com/elm/url/issues/41
Issue #20 also says that this decoding should not occur in all cases, referencing this issue that points to implementations in many languages that do not call percentDecode in a naive way.
What should be done?
Current Status
As mentioned in #16, it is currently possible to use Url.Parser.custom to create custom path parsers that have whatever behavior you personally think is best. For example, custom "STRING" Url.percentDecode would do very aggressive percent decoding.
So nobody is blocked on this topic. It is a question of defaults, and any change should be considered a breaking change that triggers a MAJOR version bump.
Goal
Make a table of scenarios to try to find the ideal defaults for a broad range of people. The default options for path parsing and building are:
- no percent decoding
- percent decoding for specific characters
- percent decoding for all characters
We currently do (1) but maybe it'd be good to make a table to show the various options. Right now I think it would be ideal for someone interested in this topic to:
- Build a table of interesting paths and see how they all work under different defaults.
- Check the defaults of implementations in other languages.
Hopefully this will reveal more information / some sort of consensus. I think being thorough about this is very important, since any change here could break a lot of peoples' code in ways that might be very hard to detect. Please share your efforts here or on the Elm Discourse.
Since an Elm application produces a [ href ] links, and upon clicking it, the Browser.application is also expected to Url.Parser.parse router url, I think the important bit is: the value need to survive "round trip"
- given a
Stringvalue that we provide toUrl.Builder.absolute - it must produce a usable link where after we click
- and parse with
Url.Parser.string - we must get back the same
Stringvalue without further intervention
Current elm/url does not round trip.
| Given String | Url.Builder.absolute produces | Url.Parser.string returns String | Round trip |
|---|---|---|---|
"hello" |
"hello" |
"hello" |
✔️ |
"👍" |
"👍" |
"%F0%9F%91%8D" |
❌ |
"искать" |
"искать" |
"%D0%B8%D1%81%D0%BA%D0%B0%D1%82%D1%8C" |
❌ |
If we use the auto encode and auto decode versions
| Given String | Url.Builder.absolute+encode produces | Url.Parser.string+decode returns String | Round trip |
|---|---|---|---|
"hello" |
"hello" |
"hello" |
✔️ |
"👍" |
"%F0%9F%91%8D" |
"👍" |
✔️ |
"искать" |
"%D0%B8%D1%81%D0%BA%D0%B0%D1%82%D1%8C" |
"искать" |
✔️ |
We can try it out in this tiny app https://elm-url-test.netlify.app (source)
- Fill in the values
- Choose the mode
-
NoPercentEncodeDecodeusesUrl.Builder.absoluteandUrl.Parser.stringas-is -
PercentEncodeDecodePathusesUrl.Builder.absolute + encodeandUrl.Parser.string + decode
-
- And most importantly, click on the generated link to see how the round trip turns out. Are we able to parse and obtain our original String values back
NOTE: if you gave
Set Patha value of!@#$%^&*()and run the following js in browser console,Array.from(document.getElementsByTagName('a')).forEach(function(a) { console.log(a.href) })you'll notice that the
a[href]wasn't escaped. this is a problem since#is treated literally as url hashhttps://elm-url-test.netlify.app/path/!@#$%^&*()?query=%F0%9F%91%8Dif you did the same but with
Set mode: [x] PercentEncodeDecodePathand run the same js, you'll notice that thea[href]is properly escapedhttps://elm-url-test.netlify.app/path/!%40%23%24%25%5E%26*()?query=%F0%9F%91%8D
since any change here could break a lot of peoples' code in ways that might be very hard to detect
The function names would best be renamed to avoid this problem
-
Url.Parser.stringbecomesUrl.Parser.rawStringand we addUrl.Parser.decodedString -
Url.Builder.absoluteet al will need to accept ~List String~List PathSegment(counter part toList QueryParameter) but that means there's no generating of un-encoded paths
using strings as-is, vuejs mostly round trips properly ~except it got thrown off by #~ (fixed by using named route for !@#$%^&*())
<router-link to="/path/foo?query=foo">Go to foo</router-link>
<router-link to="/path/👍?query=👍">Go to 👍</router-link>
<router-link :to="{ name: 'pathRoute', params: { id: '!@#$%^&*()' }}">Go to !@#$%^&*()</router-link>
<router-link to="/path/{}[]<>;/?query=/{}[]<>;/">Go to {}[]<>;/</router-link>
<router-link to="/path/日本?query=日本">Go to 日本</router-link>
<router-link to="/path/خحجث?query=خحجث">Go to خحجث</router-link>
the generated href attribute values are escaped
https://vue-url-test.netlify.app/path/foo?query=foo
https://vue-url-test.netlify.app/path/%F0%9F%91%8D?query=%F0%9F%91%8D
https://vue-url-test.netlify.app/path/!@%23$%25%5E&*()
https://vue-url-test.netlify.app/path/%7B%7D[]%3C%3E;/?query=%2F%7B%7D%5B%5D%3C%3E%3B%2F
https://vue-url-test.netlify.app/path/%E6%97%A5%E6%9C%AC?query=%E6%97%A5%E6%9C%AC
https://vue-url-test.netlify.app/path/%D8%AE%D8%AD%D8%AC%D8%AB?query=%D8%AE%D8%AD%D8%AC%D8%AB
https://vue-url-test.netlify.app/ (source)
using strings as-is, react router mostly round trips properly. # character gave runtime error; solved with manual encodeURI
<li><Link to={{ pathname: "/path/foo", search: "?query=foo" }}>foo</Link></li>
<li><Link to={{ pathname: "/path/👍", search: "?query=👍" }}>👍</Link></li>
<li><Link to={{ pathname: "/path/!@#$%^&*()", search: "?query=!@#$%^&*()" }}>!@#$%^&*()</Link></li>
<li><Link to={{ pathname: "/path/" + encodeURI("!@#$%^&*()"), search: "?query=!@#$%^&*()" }}>!@#$%^&*()</Link> fixed</li>
<li><Link to={{ pathname: "/path/{}[]<>;", search: "?query={}[]<>;" }}>{}[]<>;</Link></li>
<li><Link to={{ pathname: "/path/日本", search: "?query=日本" }}>日本</Link></li>
<li><Link to={{ pathname: "/path/خحجث", search: "?query=خحجث" }}>خحجث</Link></li>
https://codesandbox.io/s/react-router-url-parameters-forked-sno3x?file=/example.js
Array.from(document.getElementsByTagName('a')).forEach(function(a) { console.log(a.href) })
running the above js in codesandbox's own console (bottom right), you'll notice most of the a[href] values are escaped, except for the 2 with !@#$%^&*()
https://sno3x.csb.app/path/foo?query=foo
https://sno3x.csb.app/path/%F0%9F%91%8D?query=%F0%9F%91%8D
https://sno3x.csb.app/path/!@#$%^&*()?query=!@#$%^&*()
https://sno3x.csb.app/path/!@#$%25%5E&*()?query=!@#$%^&*()
https://sno3x.csb.app/path/%7B%7D[]%3C%3E;?query={}[]%3C%3E;
https://sno3x.csb.app/path/%E6%97%A5%E6%9C%AC?query=%E6%97%A5%E6%9C%AC
https://sno3x.csb.app/path/%D8%AE%D8%AD%D8%AC%D8%AB?query=%D8%AE%D8%AD%D8%AC%D8%AB
- Not auto-encoding
Url.Builder.absolutepath segments means we rely on browser (or http client) implementation of Postel's Law to fixup urls for us - Not auto-decoding
Url.Parser.stringmeans the values we provide to build links does not come back to us correctly
So autoencoding Url.Builder.absolute and auto decoding Url.Parser.string are both required.
[very sorry this came in as multiple comments. I didn't know how much time I had or needed]