html2text icon indicating copy to clipboard operation
html2text copied to clipboard

Bug - Converting an <img> tag with a hypen in src and a src greater than 74 characters adds a newline after the hypen in the output

Open mkmoisen opened this issue 9 years ago • 5 comments

I noticed an odd bug when Converting an <img> tag containing:

  • A hyphen in the src
  • A src longer than 74 characters

Converting a <img> tag with a src of 74 characters or less works fine

> # Note the missing "y" in the last word, "supply"
>img = '<img src="http://matthewmoisen.com/blog/wp-content/matthew_moisen_tractor_suppl.jpg">'
>html2text.html2text(img)
u'![](http://matthewmoisen.com/blog/wp-content/matthew_moisen_tractor_suppl.jpg)\n\n'

> # Note the addition of the "y" in the last word, "supply"
>img = '<img src="http://matthewmoisen.com/blog/wp-content/matthew_moisen_tractor_supply.jpg">'
>html2text.html2text(img)
u'![](http://matthewmoisen.com/blog/wp-\ncontent/matthew_moisen_tractor_supply.jpg)\n\n'

See how a \n character has been added after wp- ?

mkmoisen avatar Sep 04 '16 23:09 mkmoisen

@mkmoisen you might wanna have a look a this #91 The project has been moved to https://github.com/Alir3z4/html2text/

Alir3z4 avatar Sep 05 '16 06:09 Alir3z4

I noticed an odd bug when Converting an <img> tag containing:

  • A hyphen in the src
  • A src longer than 74 characters

Converting a <img> tag with a src of 74 characters or less works fine

> # Note the missing "y" in the last word, "supply"
>img = '<img src="http://matthewmoisen.com/blog/wp-content/matthew_moisen_tractor_suppl.jpg">'
>html2text.html2text(img)
u'![](http://matthewmoisen.com/blog/wp-content/matthew_moisen_tractor_suppl.jpg)\n\n'

> # Note the addition of the "y" in the last word, "supply"
>img = '<img src="http://matthewmoisen.com/blog/wp-content/matthew_moisen_tractor_supply.jpg">'
>html2text.html2text(img)
u'![](http://matthewmoisen.com/blog/wp-\ncontent/matthew_moisen_tractor_supply.jpg)\n\n'

See how a \n character has been added after wp- ?

HI, i have the same problem as you so how did you resolve it? thx

JQ-K avatar Mar 03 '20 13:03 JQ-K

@JQ-K Sorry I do not remember.

mkmoisen avatar Mar 03 '20 16:03 mkmoisen

@JQ-K Sorry I do not remember. ok thx

JQ-K avatar Mar 06 '20 04:03 JQ-K

I got around this issue by avoiding wrapping altogether. Using the bodywidth argument: html2text(html=str(soup), bodywidth=0)

durcheinandermann avatar Aug 07 '20 14:08 durcheinandermann