Fixed safe uri alphabet (+$ -> _.) for compressToEncodedURIComponent/decompressToEncodedURIComponent
I don't know why '+' and '$' were chosen when this feature was implemented, but both characters need to be encoded for valid query strings. (encodeURIComponent('+$') => '%2B%24')
So I changed them to '_' and '.' ... and also removed the +/whitespace replacement as it's not needed anymore.
Thanks, this is a known issue but we cannot merge this unfortunately :(
The problem is that we cannot change this without breaking backwards compatibility.
(also, definitely something to remember if we ever go through a "spec" change - see #122)
While I agree we should do this, this is a breaking change and will break havoc all over the place.
So we either reject it or we create another method...
@r0b- @pieroxy
I propose two new methods...
-
compressToURISafe()orcompressToURISafeEncoding() -
decompressFromURISafe()ordecompressFromURISafeEncoding()
With "URI Safe" being the "Unreserved" characters defined in RFC 3986 as discussed here...
https://joshbuchea.com/uri-safe-characters/
...but excluding the tilde character '~' for historical reasons due to RFC 1738 as discussed here...
https://jkorpela.fi/tilde.html
I think that naming the methods the way I suggest above makes it clearer what the method is actually doing. As "URI Safe" can be thought of as a character encoding like "UTF-16" so the new method naming should parallel the existing naming (i.e. compressToUTF16 / decompressFromUTF16). The current method names (which allude to encodeURIComponent and decodeURIComponent ) make it sound like the result will have URL-encoded values using percents or something, which isn't really what it does as the character set used by the method (ignoring the current bug) should produce a string that doesn't actually need to be URI-encoded (i.e. using percents) at all. I totally understand why the original name was chosen, but I think it could be clearer for the user.
Possible alternate (but longer) method names if the ones suggested above aren't acceptable:
-
compressToRFC3986URISafe()decompressFromRFC3986URISafe() -
compressToRFC3986Unreserved()decompressFromRFC3986Unreserved() -
compressToRFC3986UnreservedURICharacters()decompressFromRFC3986UnreservedURICharacters()
etc...
@r0b- @Rycochet
I just came across something called Base64URL which is described in Section 5 of RFC 4648 back in 2006...
https://www.rfc-editor.org/rfc/rfc4648#section-10
...and explained in more detail on this website...
https://base64.guru/standards/base64url
Base64URL is addressing the same problem with regards to Base64 encoded strings containing characters that aren't allowed in URLs.
In short, it's the same algorithm as Base64 except for these 4 changes:
- Replaces “+” by “-” (minus)
- Replaces “/” by “_” (underline)
- Does not require a padding character
- Forbids line separators
I feel that the new compressToURISafe() method (or whatever it ends up being called) should use the same implementation as described in RFC 4648.
EDIT: I found another reference point in the Java 8 Base64 API... https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html
They also have a Base64 MIME encoding method which we may also want to implement.