html2text icon indicating copy to clipboard operation
html2text copied to clipboard

mark_code with <p class="pre">

Open mr-remington opened this issue 8 years ago • 2 comments

$ html2text --version html2text 2016.9.19 $ python --version Python 2.7.14

I noticed the code doesn't handle marking code when its wrapped around <p class="pre"> tags. I found during parsing it doesn't handle attrs and I was able to get around this by:

$ diff -u __init__.py /usr/local/lib/python2.7/dist-packages/html2text/__init__.py
--- __init__.py	2017-10-22 22:23:31.588370064 -0700
+++ /usr/local/lib/python2.7/dist-packages/html2text/__init__.py	2016-09-18 15:03:55.000000000 -0700
@@ -600,7 +600,7 @@
                 if tag in ["td", "th"] and start:
                     self.td_count += 1
 
-        if tag == "pre" or ('class' in attrs and attrs['class'] == 'pre') or tag == 'p' and self.pre == 1:  
+        if tag == "pre":
             if start:
                 self.startpre = 1
                 self.pre = 1

This works though it seems a bit messy to submit a PR from. Any better ideas?

mr-remington avatar Oct 23 '17 05:10 mr-remington

It's actually quiet possible to define the entire HTML as div tags and the semantics in classes. html2text does handle code tags well enough with backticks and pre tags are also preserved.

It would make sense to preprocess (perhaps with a regex?) the above HTML and replace <p class="pre"> with <pre>.

@Alir3z4 ?

theSage21 avatar Dec 25 '17 04:12 theSage21

Anything that doesn't make the code ugly and hard to read is fine by me. My vote for having such fix.

Alir3z4 avatar Jan 07 '19 20:01 Alir3z4