Michael
Michael
As a workaround one can use the `surrogatepass` error handler to convert the surrogates into normal unicode code points: ``` >>> html = html.encode("utf-16", errors="surrogatepass").decode("utf-16") >>> html 'https://host.invalid/👩\n' ``` But,...
I think I tracked it down to https://github.com/executablebooks/mdurl/blob/a0f259c699eb2f75b7df290ed1e731f9b27ee171/src/mdurl/_decode.py#L33 used here https://github.com/executablebooks/markdown-it-py/blob/7e677c4e7b4573eaf406a13882f3fee4b19b97f4/markdown_it/common/normalize_url.py#L40 which decodes a percent-encoded string into unicode with surrogates, which is correct for Javascript (which still uses UTF-16 internally...
@hukkin Thank you for the fix!