Support CSS white-space: pre-wrap
In default HTML with no CSS, <p> and <pre> differ in two separate ways:
-
<p>wraps paragraphs at word boundaries, whereas<pre>either doesn't wrap at all (creating arbitrarily long lines), or if that isn't possible in the output medium (like here), wraps in the simplest possible way, breaking lines at the very right-hand edge of the display without regard to whitespace. -
<p>canonicalises whitespace so that you see just one space between words, and no space at the start of a paragraph, no matter what mixture of space and newline characters appeared in the input HTML.
The CSS option white-space: pre-wrap defines a half-way point between the two. Words are still wrapped as in <p>, but whitespace is not canonicalised. This makes it a good choice when HTML is autogenerated from simple plain text with no markup: you get a reasonable balance between paragraphs being readable, and things like code snippets also being readable (because the indentation is preserved), without the HTML creator having to judge which paragraphs should be <p> and which <pre>. In particular, the Mastodon web UI sets this CSS option when displaying the HTML of a toot.
It would be useful for html2text to support this CSS setting, injected via Config::add_css(), so that (for example) toots received as HTML from a Mastodon server could be formatted in a way that didn't lose information intended to be preserved.
(I understand that CSS also has independent settings for text-wrap-mode and white-space-collapse, so that white-space: pre-wrap is just a shorthand for setting both of those to the appropriate values.)