mail-parser icon indicating copy to clipboard operation
mail-parser copied to clipboard

UnicodeDecodeError when parsing email with "\u" in its body

Open dcfreire opened this issue 4 years ago • 2 comments

Using parse_from_file or parse_from_bytes in an email containing "\u" raises UnicodeDecodeError

Steps to reproduce the behavior:

  1. import mailparser
  2. mail = mailparser.parse_from_file(f)
  3. See error

Expected behavior Mail is properly parsed

Raw mail

Received: from localhost ([127.0.0.1]) by home with MailEnable ESMTPA; Mon, 2 Aug 2021 06:23:45 -0300
Subject: <example.com> Test
From: Example <[email protected]>
To: Example <[email protected]>
Reply-To: Example <[email protected]>
Return-Path: [email protected]
Date: Mon, 02 Aug 2021 06:23:45 -0300
X-Mailer: PHP/7.1.14
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
MIME-Version: 1.0
Message-ID: <9F4E043004D549FEAEBF0A374D252EE8.MAI@home>


1. Website "\upload site\public_html"

Environment:

  • OS: Debian Buster
  • Docker: yes
  • mail-parser version 3.15.0

Additional context Traceback:

   File "/app/parser/parsers/mail.py", line 22, in _set_parser_obj
     self.parser_obj = parse_from_file(self.filepath)
   File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 79, in parse_from_file
    return MailParser.from_file(fp)
   File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 191, in from_file
     return cls(message)
   File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 138, in __init__
     self.parse()
   File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 446, in parse
     payload = payload.decode('raw-unicode-escape')
 UnicodeDecodeError: 'rawunicodeescape' codec can't decode bytes in position 13-14: truncated \uXXXX escape

dcfreire avatar Aug 06 '21 16:08 dcfreire

I'm having a related issue with an email too. Interestingly enough, I can load the email using python's native email reader.

Error is triggered in the same place:

    444                         cte = cte.lower()
    445                     if not cte or cte in ['7bit', '8bit']:
--> 446                         payload = payload.decode('raw-unicode-escape')
    447                     else:
    448                         payload = ported_string(payload, encoding=charset)
UnicodeDecodeError: 'rawunicodeescape' codec can't decode bytes in position 1965-1966: truncated \UXXXXXXXX escape

bafonso avatar Aug 21 '21 00:08 bafonso

Same issue for me , when using C:\Users

Received: from localhost ([127.0.0.1]) by home with MailEnable ESMTPA; Mon, 2 Aug 2021 06:23:45 -0300
Subject: <example.com> Test
From: Example <[email protected]>
To: Example <[email protected]>
Reply-To: Example <[email protected]>
Return-Path: [email protected]
Date: Mon, 02 Aug 2021 06:23:45 -0300
X-Mailer: PHP/7.1.14
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
MIME-Version: 1.0
Message-ID: <9F4E043004D549FEAEBF0A374D252EE8.MAI@home>

Hello 

Path to my home :

C:\Users\me

edi9999 avatar Dec 01 '22 17:12 edi9999