Extra '\' slash appear before '-' and '.'
Extra slash was added in front of output when two and above '-' were encountered. eg. echo '<p>-</p> | html2text -> '-' echo '<p>--</p> | html2text -> '\--' Also, if the input string format is '[0-9].[space]', the output will be '[0-9]. ', eg. echo '<p>.</p> -> '.' echo '<p>..</p> -> '..' echo '<p>2.</p> -> '2.' echo '<p>2. </p> -> '2\. ' echo '<p>a. </p> -> 'a. '
Issue happens at utils.py package file (Python37\Lib\site-packages\html2text\utils.py) at lines 210, 211, 212. Here are those lines that work: text = config.RE_MD_DOT_MATCHER.sub(r"\1\2", text) text = config.RE_MD_PLUS_MATCHER.sub(r"\1\2", text) text = config.RE_MD_DASH_MATCHER.sub(r"\1\2", text)
These lines originally have 2 extra backslashes, just replacing these 3 lines should fix this issue. Not sure if it could break something else.