dirty in, dirty out
First, I love this tool -- thanks for making it!
I have run into a few instances where I get somewhat odd markdown back out of gather, so I've been saving a few examples.
(1) https://time.com/6286449/ray-dalio-world-great-disorder/ produces the following:
1. **The ****largest amounts of debt, the fastest rates of debt growth, and the greatest
amounts of central bank printing of money and buying debt since 1930-45. **
which is not the cleanest markdown (empty open/close bold before "largest" and a " " space character before the end of the line / closing bold), but clearly the result parsing:
<ol> <li><strong>The </strong><strong>largest amounts of debt, the fastest
rates of debt growth, and the greatest amounts of central bank printing of
money and buying debt since 1930-45. </strong></li>
Ideally, would be great to get a cleaned up version:
1. **The largest amounts of debt, the fastest rates of debt growth, and the greatest
amounts of central bank printing of money and buying debt since 1930-45.**
(2) https://www.commentary.org/articles/gary-morson/joseph-epsteins-argues-we-all-need-novels/ produces:
_Great Expecta__tions_: Life disappoints.
which would ideally be
_Great Expectations_: Life disappoints.
but again clearly result of just parsing:
<i>Great Expecta</i><i>tions</i>: Life disappoints.
I suppose it makes sense from a simplicity POV to just do the literal parsing of html with no "cleanup" of the markdown, but what do you think of adding a flag to do things like:
- remove empty
<i>,<b>,<u>, etc elements - remove trailing spaces before the close of one of these elements
Essentially running output through some sort of linter with autofix.
Perhaps this is easier imagined / said than done.
Running the second example through prettier yields:
_Great Expecta\_\_tions_: Life disappoints.
What are your thoughts on such a modification?