hyperlink icon indicating copy to clipboard operation
hyperlink copied to clipboard

Unable to parse `http://www.test.com/BMF%20Ver%F6ffentlichungen?`

Open damiencarol opened this issue 4 years ago • 3 comments

Seems the parse function generate an error for this URL: http://www.test.com/BMF%20Ver%F6ffentlichungen?

Logs:

>>> import hyperlink
>>> hyperlink.parse("http://www.test.com/BMF%20Ver%F6ffentlichungen?")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2447, in parse
    dec_url = DecodedURL(enc_url, lazy=lazy)
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2046, in __init__
    self.host, self.userinfo, self.path, self.query, self.fragment
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2177, in path
    [
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 2178, in <listcomp>
    _percent_decode(p, raise_subencoding_exc=True)
  File "/home/damien/dd2/.venv/lib/python3.9/site-packages/hyperlink/_url.py", line 766, in _percent_decode
    return unquoted_bytes.decode(subencoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 7: invalid start byte

damiencarol avatar Jul 13 '21 16:07 damiencarol

FYI @37b

damiencarol avatar Jul 13 '21 16:07 damiencarol

Hi Damien! Hyperlink by default is reporting that the %F6 in your URL is invalid text when decoded from UTF-8. We can try adding the decoded=False parameter to get a result:

>>> hyperlink.parse('http://www.test.com/BMF%20Ver%F6ffentlichungen', decoded=False)
URL.from_text('http://www.test.com/BMF%20Ver%F6ffentlichungen')

This approach gives you a URL with mostly the same interface as a DecodedURL (the default output of parse), but be aware that you may run into issues when trying to treat parts of that URL as text vs bytes. Hope this helps!

mahmoud avatar Jul 13 '21 20:07 mahmoud

@mahmoud thanks, we are investigating if we can use the decoded flag.

damiencarol avatar Aug 27 '21 14:08 damiencarol