cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Error parsing email headers: AttributeError: 'ValueTerminal' object has no attribute 'fold'

Open mgmacias95 opened this issue 1 year ago • 3 comments

Bug report

Bug description:

The following code breaks with an attribute error:

import email.parser
import email.policy

a = 't'*46

h = f'''\
To: =?utf-8?B?dGVzdC50ZXN0LnRlc3QudGVzdEB0ZXN0LmNvbeKAiw=?= <[email protected]>,\r\n\t"tttest&{a}[email protected]" <[email protected]>,\r\n\t"tttest&{a}[email protected]" <[email protected]>'''

m = email.parser.HeaderParser(policy=email.policy.default).parsestr(h)
m.as_string()

The problem was introduced on https://github.com/python/cpython/pull/100885, setting ListSeparator.as_ew_allowed = False to True fixes the problem. Changing any character in the header in the example above also fixes the problem (which makes it harder to understand exactly why it's broken).

CPython versions tested on:

3.12

Operating systems tested on:

macOS

mgmacias95 avatar May 06 '24 12:05 mgmacias95

From RFC 822, section 3.11:

        Note:  While the standard  permits  folding  wherever  linear-
               white-space is permitted, it is recommended that struc-
               tured fields, such as those containing addresses, limit
               folding  to higher-level syntactic breaks.  For address
               fields, it  is  recommended  that  such  folding  occur
               between addresses, after the separating comma.

I think it means that we should also set ListSeparator.syntactic_break to False.

serhiy-storchaka avatar May 10 '24 16:05 serhiy-storchaka

But this does not help. It just returns the bug reported in #100884.

serhiy-storchaka avatar May 10 '24 17:05 serhiy-storchaka

But this does not help. It just returns the bug reported in #100884.

I don't understand this comment, I just tested the code mentioned in that issue setting ListSeparator.syntactic_break to False and it works. What bug is returned then?

mgmacias95 avatar May 11 '24 16:05 mgmacias95

It no longer raises an exception, but it encodes the comma as =?utf-8?q?=2C?=. #100885 was not only incorrect, even after fixing the error it is not enough to fix the original issue #100884. I am trying to find a solution which fixes both the original issue and the new error.

serhiy-storchaka avatar May 16 '24 12:05 serhiy-storchaka