Components with control characters don't appear in `--json` output, and non-urlencoded `--get` fails
$ ./trurl 'http://example.org/%18' --json | jq -c .
[{"url":"http://example.org/%18","parts":{"scheme":"http","host":"example.org"}}]
$ ./trurl 'http://example.org/%18' --urlencode --json | jq -c .
[{"url":"http://example.org/%18","parts":{"scheme":"http","host":"example.org","path":"/%18"}}]
$ ./trurl 'http://example.org/%18' -g {path}
trurl note: URL decode error, most likely because of rubbish in the input (path)
$ ./trurl 'http://example.org/%18' -g {:path}
/%18
Something interesting I noticed is that is works for queries. I wonder if we're missing a memdupdec somewhere?
I'd bet I broke this in this PR https://github.com/curl/trurl/pull/214 but maybe it's been broken the whole time.
This looks like it's behavior from libcurl. I was able to get the same result with the following code. Should we open a ticked over there or are we just overlooking something simple?
#include <curl/curl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
CURL *curl;
CURLU *url;
CURLUcode uc;
// do it how trurl does it
char *array= calloc(32, sizeof(char));
const char *url_string = "http://example.org/%18";
curl = curl_easy_init();
url = curl_url();
uc = curl_url_set(url, CURLUPART_URL, url_string, 0);
uc = curl_url_get(url, CURLUPART_PATH, &array, CURLU_URLDECODE);
if(uc) {
printf("%s\n", curl_url_strerror(uc));
} else {
printf("%s\n", array);
}
// try with curl easy unescape
int decode_len;
char *decoded = curl_easy_unescape(curl, url_string, strlen(url_string), &decode_len);
printf("%s\n", decoded);
printf("length: %ld, amount decoded: %d\n", strlen(url_string), decode_len);
curl_url_cleanup(url);
curl_easy_cleanup(curl);
free(array);
return 0;
}
Ahh it could also be that %18 maps to the ASCII character CAN (cancel), I'd bet curl doesn't play nice with decoding most control characters in the path. If you do it with %21 (either trurl or the example above you get the following:
$ trurl http://example.org/%21 --get "{path}"
/!
After some more testing I think you are just supposed to pass --urlencode for this scenario. We could do something to try and hint at this to the user?
$ trurl http://example.org/%18 --get "{path}"
trurl note: URL decode error, most likely because of rubbish in the input (path)
try again with --urlencode