dirty in, dirty out

Open kortina opened this issue 2 years ago • 0 comments

First, I love this tool -- thanks for making it!

I have run into a few instances where I get somewhat odd markdown back out of gather, so I've been saving a few examples.

(1) https://time.com/6286449/ray-dalio-world-great-disorder/ produces the following:

1. **The ****largest amounts of debt, the fastest rates of debt growth, and the greatest
amounts of central bank printing of money and buying debt since 1930-45. **

which is not the cleanest markdown (empty open/close bold before "largest" and a " " space character before the end of the line / closing bold), but clearly the result parsing:

<ol> <li><strong>The </strong><strong>largest amounts of debt, the fastest
 rates of debt growth, and the greatest amounts of central bank printing of 
money and buying debt since 1930-45. </strong></li>

Ideally, would be great to get a cleaned up version:

1. **The largest amounts of debt, the fastest rates of debt growth, and the greatest 
amounts of central bank printing of money and buying debt since 1930-45.**

(2) https://www.commentary.org/articles/gary-morson/joseph-epsteins-argues-we-all-need-novels/ produces:

 _Great Expecta__tions_: Life disappoints.

which would ideally be

 _Great Expectations_: Life disappoints.

but again clearly result of just parsing:

<i>Great Expecta</i><i>tions</i>: Life disappoints.

I suppose it makes sense from a simplicity POV to just do the literal parsing of html with no "cleanup" of the markdown, but what do you think of adding a flag to do things like:

remove empty <i>, <b>, <u>, etc elements
remove trailing spaces before the close of one of these elements

Essentially running output through some sort of linter with autofix.

Perhaps this is easier imagined / said than done.

Running the second example through prettier yields:

 _Great Expecta\_\_tions_: Life disappoints.

What are your thoughts on such a modification?

Jul 23 '23 21:07 kortina