mark_code with <p class="pre">
$ html2text --version html2text 2016.9.19 $ python --version Python 2.7.14
I noticed the code doesn't handle marking code when its wrapped around <p class="pre"> tags. I found during parsing it doesn't handle attrs and I was able to get around this by:
$ diff -u __init__.py /usr/local/lib/python2.7/dist-packages/html2text/__init__.py
--- __init__.py 2017-10-22 22:23:31.588370064 -0700
+++ /usr/local/lib/python2.7/dist-packages/html2text/__init__.py 2016-09-18 15:03:55.000000000 -0700
@@ -600,7 +600,7 @@
if tag in ["td", "th"] and start:
self.td_count += 1
- if tag == "pre" or ('class' in attrs and attrs['class'] == 'pre') or tag == 'p' and self.pre == 1:
+ if tag == "pre":
if start:
self.startpre = 1
self.pre = 1
This works though it seems a bit messy to submit a PR from. Any better ideas?
It's actually quiet possible to define the entire HTML as div tags and the semantics in classes. html2text does handle code tags well enough with backticks and pre tags are also preserved.
It would make sense to preprocess (perhaps with a regex?) the above HTML and replace <p class="pre"> with <pre>.
@Alir3z4 ?
Anything that doesn't make the code ugly and hard to read is fine by me. My vote for having such fix.