viewhtmlmail.py fails for certain characters

Open protist opened this issue 3 years ago • 0 comments

Firstly, thank you so much for maintaining viewhtmlmail. This is an essential part of using (neo)mutt for me! Very much appreciated.

I've noticed for certain characters, the script decodes the unicode. That is, the output contain strings such as \u201c. I compared these emails to the original raw html from these emails. Everything is identical (including headers), except these characters are modified. I have replicated this issue with the provided test file htmlmail.eml, adding a few characters as per the following.

--- /tmp/htmlmail1.eml	2022-10-17 13:42:30.306089285 +1100
+++ /tmp/htmlmail2.eml	2022-10-17 13:45:40.435774500 +1100
@@ -72,7 +72,7 @@
 hing about pi=C3=B1on trees=E2=80=94and em and en=E2=80=93dashes.</div><div=
 ><br></div><div>And an image!<br><div><img src=3D"cid:ii_kh9ezunp0" alt=3D"=
 tuxnetwork.jpg" width=3D"256" height=3D"187"><br></div><div>and some text a=
-fter the image. And an emoji! =F0=9F=98=80<br></div></div><div><br>-- <br><=
+fter the image. And an emoji! =F0=9F=98=80 And “quotes” at 37°C<br></div></div><div><br>-- <br><=
 div dir=3D"ltr" class=3D"gmail_signature" data-smartmail=3D"gmail_signature=
 ">=C2=A0 =C2=A0 ...Akkana</div></div></div>

The original html has

And “quotes” at 37°C

and the version that viewhtmlmail produces has

And \u201cquotes\u201d at 37�C

The quotes have decoded, and the ° turns into unicode U+FFFD REPLACEMENT CHARACTER.

Oct 17 '22 03:10 protist