mail-parser
mail-parser copied to clipboard
UnicodeDecodeError when parsing email with "\u" in its body
Using parse_from_file or parse_from_bytes in an email containing "\u" raises UnicodeDecodeError
Steps to reproduce the behavior:
-
import mailparser -
mail = mailparser.parse_from_file(f) - See error
Expected behavior Mail is properly parsed
Raw mail
Received: from localhost ([127.0.0.1]) by home with MailEnable ESMTPA; Mon, 2 Aug 2021 06:23:45 -0300
Subject: <example.com> Test
From: Example <[email protected]>
To: Example <[email protected]>
Reply-To: Example <[email protected]>
Return-Path: [email protected]
Date: Mon, 02 Aug 2021 06:23:45 -0300
X-Mailer: PHP/7.1.14
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
MIME-Version: 1.0
Message-ID: <9F4E043004D549FEAEBF0A374D252EE8.MAI@home>
1. Website "\upload site\public_html"
Environment:
- OS: Debian Buster
- Docker: yes
- mail-parser version 3.15.0
Additional context Traceback:
File "/app/parser/parsers/mail.py", line 22, in _set_parser_obj
self.parser_obj = parse_from_file(self.filepath)
File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 79, in parse_from_file
return MailParser.from_file(fp)
File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 191, in from_file
return cls(message)
File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 138, in __init__
self.parse()
File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 446, in parse
payload = payload.decode('raw-unicode-escape')
UnicodeDecodeError: 'rawunicodeescape' codec can't decode bytes in position 13-14: truncated \uXXXX escape
I'm having a related issue with an email too. Interestingly enough, I can load the email using python's native email reader.
Error is triggered in the same place:
444 cte = cte.lower()
445 if not cte or cte in ['7bit', '8bit']:
--> 446 payload = payload.decode('raw-unicode-escape')
447 else:
448 payload = ported_string(payload, encoding=charset)
UnicodeDecodeError: 'rawunicodeescape' codec can't decode bytes in position 1965-1966: truncated \UXXXXXXXX escape
Same issue for me , when using C:\Users
Received: from localhost ([127.0.0.1]) by home with MailEnable ESMTPA; Mon, 2 Aug 2021 06:23:45 -0300
Subject: <example.com> Test
From: Example <[email protected]>
To: Example <[email protected]>
Reply-To: Example <[email protected]>
Return-Path: [email protected]
Date: Mon, 02 Aug 2021 06:23:45 -0300
X-Mailer: PHP/7.1.14
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
MIME-Version: 1.0
Message-ID: <9F4E043004D549FEAEBF0A374D252EE8.MAI@home>
Hello
Path to my home :
C:\Users\me