pretty First naive attempt to generalise the API to support different textual representations

Hey all,

following from @bgamari 's feedback, this is a very first, crude attempt to generalise the user-facing API to support multiple textual representations which can be reified back into a normal String, but that supports getting the length of the sequence in an efficient way. This PR does two things:

It (boldly!) removes the PStr constructor from TextDetails, as I think it's deprecated anyway and is adding just clutter for nothing.
It introduces the RuneSequence r typeclass which allows generalising the top level API (things like the text combinators) to be independent from String.

I really hope this is a good step towards the bigger goal of replacing GHC's internal copy of pretty with this library 😉

Let me know what you guys think!

Apr 27 '17 08:04 adinapoli

This looks quite reasonable to me.

Apr 27 '17 16:04 bgamari

@dterei , do you think this has a chance to be merged or is there anything you would like to see before doing so?

Apr 28 '17 08:04 adinapoli

Firstly, let's not remove ptext, we should deprecate it first and remove it in a later release.

Secondly, I'm not sure about using a type-class. It seems reasonable, but we could as easily just add text_efficient :: String -> Int -> Doc a rather than a type-class.

I'd love to understand better first for all of this work:

Why is String slow?
How slow is String compared with an alternative, such as FastString?

Sorry to slow you up, but this is a long living problem and potentially a big change, so I'd like to understand the bottom better. Let's keep the discussion mainly in #42.

May 01 '17 17:05 dterei

Firstly, let's not remove ptext, we should deprecate it first and remove it in a later release.

I had no idea if (in practice) that was an obsolete relic or something some users could potentially still be using, but I do agree is way more prudent to go through the usual deprecation cycle!

Sorry to slow you up, but this is a long living problem and potentially a big change, so I'd like to understand the bottom better.

Absolutely! As there is quite an overlap with the discussion in #42 anyway, as you suggested I'm going to simply write a more comprehensive comment there so that we keep the discussion in only one place 😉

May 01 '17 18:05 adinapoli

@adinapoli Anything I should be reviewing or doing with these new commits?

May 19 '17 16:05 dterei

Hey @dterei ! I have done mostly 2 things:

I have restored ptext and I have appropriately deprecated it.
I have benchmarked different implementations of RuneSequence, with these results:

benchmarking render/doc1/string
time                 330.4 ms   (324.2 ms .. 336.8 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 324.7 ms   (322.5 ms .. 327.0 ms)
std dev              2.703 ms   (1.558 ms .. 3.349 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking render/doc1/bytestring
time                 322.6 ms   (277.4 ms .. 367.9 ms)
                     0.995 R²   (0.985 R² .. 1.000 R²)
mean                 328.0 ms   (321.2 ms .. 341.2 ms)
std dev              13.16 ms   (220.3 μs .. 15.47 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking render/doc2/string
time                 588.7 ms   (NaN s .. 594.1 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 583.2 ms   (581.8 ms .. 584.4 ms)
std dev              1.898 ms   (0.0 s .. 2.065 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking render/doc2/bytestring
time                 572.7 ms   (558.1 ms .. 585.8 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 577.0 ms   (574.4 ms .. 578.8 ms)
std dev              2.708 ms   (0.0 s .. 3.120 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking render/doc3/string
time                 1.828 s    (1.809 s .. 1.862 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.827 s    (1.823 s .. 1.833 s)
std dev              5.259 ms   (0.0 s .. 5.655 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking render/doc2/bytestring
time                 1.822 s    (1.794 s .. 1.835 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.817 s    (1.811 s .. 1.821 s)
std dev              5.958 ms   (0.0 s .. 6.690 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fullRender/PageMode 1000/string
time                 585.9 ms   (577.1 ms .. 594.9 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 587.0 ms   (584.9 ms .. 588.1 ms)
std dev              1.829 ms   (0.0 s .. 2.033 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fullRender/PageMode 1000/bytestring
time                 587.0 ms   (556.9 ms .. 607.6 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 583.0 ms   (575.6 ms .. 586.8 ms)
std dev              6.424 ms   (0.0 s .. 6.665 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fullRender/PageMode 100/string
time                 585.1 ms   (518.0 ms .. 646.5 ms)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 600.0 ms   (583.0 ms .. 609.8 ms)
std dev              15.21 ms   (0.0 s .. 16.93 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fullRender/PageMode 100/bytestring
time                 596.6 ms   (584.5 ms .. 603.5 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 585.7 ms   (579.6 ms .. 589.4 ms)
std dev              5.646 ms   (0.0 s .. 6.456 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking fullRender/ZigZagMode/string
time                 236.8 ms   (235.7 ms .. 238.4 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 234.3 ms   (233.1 ms .. 235.3 ms)
std dev              1.372 ms   (1.010 ms .. 1.664 ms)
variance introduced by outliers: 14% (moderately inflated)

benchmarking fullRender/ZigZagMode/bytestring
time                 236.0 ms   (232.1 ms .. 240.7 ms)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 235.7 ms   (234.1 ms .. 237.0 ms)
std dev              1.787 ms   (1.402 ms .. 2.170 ms)
variance introduced by outliers: 14% (moderately inflated)

benchmarking fullRender/LeftMode/string
time                 11.31 ms   (11.02 ms .. 11.54 ms)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 11.34 ms   (11.15 ms .. 12.01 ms)
std dev              881.3 μs   (230.7 μs .. 1.764 ms)
variance introduced by outliers: 40% (moderately inflated)

benchmarking fullRender/LeftMode/bytestring
time                 10.89 ms   (10.66 ms .. 11.08 ms)
                     0.996 R²   (0.992 R² .. 0.998 R²)
mean                 10.83 ms   (10.68 ms .. 11.03 ms)
std dev              456.1 μs   (332.6 μs .. 679.6 μs)
variance introduced by outliers: 16% (moderately inflated)

benchmarking fullRender/OneLineMode/string
time                 27.44 ms   (26.72 ms .. 28.14 ms)
                     0.995 R²   (0.987 R² .. 1.000 R²)
mean                 27.37 ms   (26.93 ms .. 28.38 ms)
std dev              1.366 ms   (652.6 μs .. 2.425 ms)
variance introduced by outliers: 16% (moderately inflated)

benchmarking fullRender/OneLineMode/bytestring
time                 25.72 ms   (25.11 ms .. 26.30 ms)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 26.54 ms   (26.01 ms .. 27.54 ms)
std dev              1.613 ms   (658.3 μs .. 2.328 ms)
variance introduced by outliers: 25% (moderately inflated)

As you can see so far, the use of ByteString yields only marginally-better results. I think this is because the corpus we are benchmarking against is not particularly representative, and possibly we would need some real, GHC-driven data. I would be curious to hear what @bgamari has to say. ~~Last but not least, I'm wondering if all those right appends could be improved by defining RuneSequence instances for DList and Builder~~ (Actually scrap that, length O-complexity would be terrible for those two, I'm clearly tired at this time of the day 🙈 )

May 19 '17 17:05 adinapoli

Yeah, we've had this issue before. Building a benchmark isn't too hard, building one that captures the performance issues that GHC has seen with pretty is far harder... I'm not sure how to help here as I don't know the issues GHC has faced.

It seems from our previous analysis that the calls to length for the string type are the performance factor.

May 19 '17 19:05 dterei

@adinapoli, do you need any help with this?

Jun 18 '17 21:06 bgamari

Hey @bgamari , thanks for keeping me honest 😉

Yes, a bit of help in gathering realistic, GHC-driven test cases would be awesome indeed. Considering it's not the most rewarding task to do (hehe) and that I'm in the middle of changing jobs, my free time is a bit limited these days 🙈

Jun 19 '17 06:06 adinapoli

No worries; just let me know if you get stuck.

Jun 23 '17 15:06 bgamari

@bgamari To be completely clear, currently I feel I'm a bit stuck, in a sense. I don't feel compelled too much in exploring different solutions as I don't have realistic benchmarks which I can leverage. Having those (or at least pointers on how to generate some) would revamp the interest, I guess 😛

Jun 23 '17 16:06 adinapoli

@adinapoli, quite understandable. It's hard to reduce GHC's use of the pretty-printer to a nice clean benchmark. The bit of GHC that is likely dependent upon pretty-printer performance is the native-code generator (NCG). Here we are constructing assembler programs, which are essentially nothing but FastStrings with some interstitial whitespace. @niteria has previously noted that GHC currently spends a significant amount of time in the pretty-printer due to the NCG.

If I were to guess, I would try constructing a benchmark that follows roughly the model I describe above: a lot of short lines, each consisting of a few shorts string separated by whitespace.

I should note that another option in the cards for GHC is to try swapping out pretty with @quchen's prettyprinter library. This would likely require picking up a text dependency, but with the way things are going it's looking rather inevitable that we'll need to include text in the GHC build eventually, even if the ghc library itself doesn't depend upon it. prettyprinter has the advantage that it already offers an optimized infinite ribbon-width codepath, which may significantly help the NCG usecase.

Jun 23 '17 19:06 bgamari

FWIW if Text is a problem, forking prettyprinter to use String is simple, but not something I want to have in the Hackage release.

Jun 24 '17 16:06 quchen

It does seem like it would be worth trying one of the alternative pretty-printer libraries. Not sure how much work that would be, hopefully only a small amount.

Otherwise though, in terms of isolating performance issues in pretty, how are we detecting that performance is an issue today? Presumably @adinapoli or others have already tried replacing ghc-pretty with pretty and noticed some performance issues through the GHC test-suite. Obviously that's a big and ugly way to test, but at least a starting point. For example, can we get the output of that performance test from GHC to understand its structure?

Jun 27 '17 16:06 dterei

Indeed in principle moving to a new pretty-printing library should be fairly straightforward. @bollu is currently trying to do this in D3650. Unfortunately, these sorts of things are never as simple as they seem: currently the branch seems to suffer from what appears to be a code-generation bug (perhaps due to slight differences in assembler output). This is one of many things on my list of things to look into but, as always, help is greatly appreciated.

Jul 21 '17 03:07 bgamari