mdformat icon indicating copy to clipboard operation
mdformat copied to clipboard

Sentence-based word wrapping

Open hukkin opened this issue 5 years ago • 3 comments

Experiment with something like

from nltk import tokenize
sentences = tokenize.sent_tokenize(paragraph)

and find out if we can implement sentence-based word wrapping to reduce diffs.

Implement as an option, dont change the default mode (which preserves wrapping).

hukkin avatar Jul 15 '20 10:07 hukkin

Due to the nltk dependency and the many, many bugs that I expect, I think any work should be started (and most likely stay) in a plugin.

hukkin avatar Apr 27 '21 22:04 hukkin

100% plugin is the right place to experiment with this. I suspect it will lead to many unexpected or unpredictable side effects which is the last thing you want in a black-style program :-)

choldgraf avatar Jun 09 '21 21:06 choldgraf

Hello there!

I thought A LOT about this issue in the last couple of days and wanted to pitch an idea. Inspired in the way that black handles docstrings, where a lot of the times "if it can fit in a a line of less than 88 chars it should", could a "lazy" implementation of the problem be to separate on punctuation unless the generated section was less than X number of characters?

I think it would accomplish a predictable behaviour and greatly reduce diff sizes, it will sometimes lead to uglier docstrings but ... well .. they will not look ugly when rendered to html ...

start empty chunk
for a given paragraph, start from the end
    append to chunk until a punctuation mark is found
        if the chunk is larger than X (.... i dont know ... 42 characters)
            yield the chunk (separate new line)

let me know what you think, (i am trying to find problems with my approach)

Temple of Doom was discovered by Dr. Jones.

Ended up implementing a version of this ... regex based and including support for some other stuff ... I want to play with it a bit more before publishing it but looks promising LMK if there are things you feel it should support that I have not tested.

https://github.com/jspaezp/mdformat-sentencebreak

jspaezp avatar Sep 15 '22 00:09 jspaezp