html2text icon indicating copy to clipboard operation
html2text copied to clipboard

Extra '\' slash appear before '-' and '.'

Open Jerry-Ku opened this issue 8 years ago • 1 comments

Extra slash was added in front of output when two and above '-' were encountered. eg. echo '<p>-</p> | html2text -> '-' echo '<p>--</p> | html2text -> '\--' Also, if the input string format is '[0-9].[space]', the output will be '[0-9]. ', eg. echo '<p>.</p> -> '.' echo '<p>..</p> -> '..' echo '<p>2.</p> -> '2.' echo '<p>2. </p> -> '2\. ' echo '<p>a. </p> -> 'a. '

Jerry-Ku avatar Sep 26 '17 17:09 Jerry-Ku

Issue happens at utils.py package file (Python37\Lib\site-packages\html2text\utils.py) at lines 210, 211, 212. Here are those lines that work: text = config.RE_MD_DOT_MATCHER.sub(r"\1\2", text) text = config.RE_MD_PLUS_MATCHER.sub(r"\1\2", text) text = config.RE_MD_DASH_MATCHER.sub(r"\1\2", text)

These lines originally have 2 extra backslashes, just replacing these 3 lines should fix this issue. Not sure if it could break something else.

bubalopetar avatar Dec 16 '21 16:12 bubalopetar