Decoupling byte-level encoding
When writing a JSON parser (GaloisInc/json#17) I needed some way to decode UTF-8 and to my dismay I found all existing solutions do not fit my expectations:
-
GHC.Encoding.UTF8andGHC.IO.EncodingareIO-based and I don't want that in a parser; -
Data.Text.Internal.Encoding.Utf8, while pure, appears to both returnRejectas an error and has a rather complex interface; -
Data.Text.Encoding.*andData.Text.Lazy.Encoding.*are already parsers themselves, too high-level for this task; -
utf-string'sCodec.Binary.UTF8.Stringconsumes and returns lists, so it isn't parser-compatible.
I decided to handroll the UTF-8 decoding, which allowed me to categorize the errors (see Encoding.Mixed.Error) and resulted in a lot of code on the parser side that has little to do with consuming bytes per se (see Codec.Web.JSON.Parse.String).
However the code I wrote can instead be generalized to:
-- Assume Error is Encoding.Mixed.Error.Error
data UTF8 a = UTF8_1 a
| Part_2 (Word8 -> UTF8_2 a)
| Part_3_1 (Word8 -> Part_3_1 a)
| Part_4_1 (Word8 -> Part_4_1 a)
| Error_1 Error
data UTF8_2 a = UTF8_2 a
| Error_2 Error
data Part_3_1 a = Part_3_2 (Word8 -> UTF8_3 a)
| Error_3_1 Error
data UTF8_3 a = UTF8_3 a
| Error_3_2 Error
data Part_4_1 a = Part_4_2 (Word8 -> Part_4_2 a)
| Error_4_1 Error
data Part_4_2 a = Part_4_3 (Word8 -> UTF8_4 a)
| Error_4_2 Error
data UTF8_4 a = UTF8_4 a
| Error_4_3 Error
newtype Conv1 a = Conv1 (Word8 -> a)
newtype Conv2 a = Conv2 (Word8 -> Word8 -> a)
newtype Conv3 a = Conv3 (Word8 -> Word8 -> Word8 -> a)
newtype Conv4 a = Conv4 (Word8 -> Word8 -> Word8 -> Word8 -> a)
utf8 :: Conv1 a -> Conv2 a -> Conv3 a -> Conv4 a -> Word8 -> UTF8 a
utf8 = -- I'm omitting the implementation, but it's only 50 lines long
Parsing then is simply unwrapping UTF8. This decouples character validation and conversion, the only part of decoding left is ensuring only the maximal subpart of an ill-formed sequence is consumed, which is the responsibility of the parser.
My proposal is creating a separate package with a focus specifically on decoding/encoding UTF-8/UTF-16/UTF-32 on byte-level. Then text can drop some internal modules in favor of a simpler common interface.
This proposal is however naive: I do not know whether GHC can inline these datatypes reliably or, indeed, at all. Based on my cursory reading of the Secrets of the Glasgow Haskell Compiler inliner paper it should, as each of these expressions is trivial.
This doesn't clash with the issue of GHC's many UTF-8 implementations (outlined in GHC.Encoding.UTF8) as all other algorithms are in IO.
Other concerns:
-
textis a core library, so I assume an extra dependency can't just be added on a whim; - Package named
utfalready exists and is deprecated. I don't know how hard reclaiming deprecated packages is.
Adding a dependency to text is too much of hassle IMO. But we can probably incorporate desired changes into text itself. Could you please elaborate why a naive parser from Data.Text.Internal.Encoding.Utf8 is not sufficient for your needs?
While Data.Text.Internal.Encoding.Utf8 is indeed functional enough to serve its purpose, my concerns are the following:
- The interface is recursive, so the
Incompletestate on the fourth byte is unreachable; - The
AcceptandIncompleteconstructors force their fields, so returned codepoints need to be evaluated even if they're never used; - Ideally I'd want to share the error type with the
textlibrary, but alasDecodeErrorrepresents that as aStringand there's no way to derive that from theRejectresult.
I do have to admit that all of these issues are minor and I do not know why anyone would ever need to use succinct errors (other than cool error reporting), but the approach I'm proposing is the properly decoupled Haskell view of things.
One thing to note is that I haven't looked deep into the structure of Hoehrmann's C-based decoder, but from what I see the by-the-book decoding is just a chain of up to thirteen comparisons, so I don't yet understand the need for a complex state machine here (other than code shortness of course, but Haskell isn't C).
For performance reasons two array lookups are much better than up to 13 comparisons.
Once Rejected, one is supposed to apply whatever error reporting desired. If you kept the previous state at hand, it should be fairly straightforward to do so.
For performance reasons two array lookups are much better than up to 13 comparisons.
Isn't this only true if the entire lookup table resides in L1 cache? Sure this will work fine for C parsers, but I don't know if any random Haskell parser interleaved with the algorithm can guarantee this.
Also it's 1 comparison for 00..7F and 5 for 80..7FF, so for really simple strings even two array lookups in L1 cache seem like an overkill.
Rolling a benchmark to compare the two approaches should be easy, so perhaps I should do that.
The main blocker for this proposal is going to be performance. I'd be surprised if you can use your API to write a streaming JSON parser whose performance is comparable to using the Data.Text.Internal.Encoding.Utf8 module or the recently added validateUtf8Chunk (etc.) primitives in Data.Text.Internal.Encoding.
There is an intentional trade off of a tiny bit of imprecision for a lot of performance. The parser state fits in a single byte (DecoderState), which can be easily unpacked by GHC optimizations into a tight loop that does no allocations. In contrast, an API like you propose with lots of first-class functions aims to more accurately represent the state machine for parsing UTF-8, reducing unreachable branches, but (1) GHC won't be able to optimize the allocations away, (2) it's unclear how that granularity results in practical benefits.
The interface is recursive, so the
Incompletestate on the fourth byte is unreachable;
Making that state unreachable is really the main point of your API, and as you mention it's unclear what the use case would be.
The
AcceptandIncompleteconstructors force their fields, so returned codepoints need to be evaluated even if they're never used;
The fields are one word each. The expectation is that they are going to be unpacked in a tight loop that does not allocate. This is much cheaper than allocating a thunk for the partial codepoint to be evaluated only if it is used. If you don't need the partial code point (only doing validation), then you can use updateDecoderState instead.
Isn't this only true if the entire lookup table resides in L1 cache? Sure this will work fine for C parsers, but I don't know if any random Haskell parser interleaved with the algorithm can guarantee this.
This is likely to be true irrespective of caches: bear in mind that in your case each comparison is a condition and involves 1 or 2 jump instructions depending on branch chosen.
So after a week of tinkering I made a benchmark repository (link). The algorithms I wrote include no low-level magic, just inlines and strict copying. The benchmark timings can be found on the README.md there, and here are a few points that follow from the results:
-
GHC does indeed inline the data structure, even at
-O1. INOINLINEd both of the parsers I wrote and the only places that retain references toCodec.Encoding.UTF8on the final STG are theTextvariants on chunk end, solely because I force it in theResumetype; -
Pretty much all UTF-8 decoding is done using
simdutf, so on every chunk border the arrays have to be pulled back from the ether just to do 1-4 lookups; -
decodeUtf8does not follow the maximal subpart rule.
Problems I could not resolve:
-
For whatever reason I can't turn off the
simdutfflag. If someone can try outdecodeUtf8without the SIMD algorithm, that'd be quite nice as it's probably the only place that clearly outperforms my solution; -
Based on the fact that the SIMD version of my
Textalgorithm runs faster on late errors than the basic one, the latter one screws up inlining and it should be at least 10% faster when done right. This isn't that important, so I haven't dug into it;
Also I wonder why simdutf returns a boolean when it could return the last known valid UTF-8 boundary.
For the record all of my benchmarks have been executed on a laptop CPU, so, as with all things cache-related, YMMV, and extra benchmark subjects are welcome.
I concede that you can get your data structure to be inlined. But that relies on unrolling the loop yourself so you always start an iteration at a code point boundary. Performance-wise, the extra branching may have a detrimental effect on branch prediction. Your initial comment about IO made me assume you didn't want simdutf but I misunderstood. If the main loop uses simdutf then performance for the error branch is much less of a concern.
I'm still not convinced a more fine grained API for UTF-8 is really better. I disagree that in comparison "Data.Text.Internal.Encoding.Utf8 (...) has a rather complex interface." That API is an automaton, which is as simple and standard as it gets: byte goes in, new state and/or output comes out. You don't have to unroll four nested pattern-matches to use that API efficiently. I think the main bit of apparent complexity is that it exposes the internal state for error reporting, and that part of the interface could be cleaned up to make it easier to diagnose errors.
decodeUtf8does not follow the maximal subpart rule.
That sounds like a bug, right?
Also I wonder why simdutf returns a boolean when it could return the last known valid UTF-8 boundary.
That's a feature request for simdutf. I don't know what the current status is, but it would indeed let us simplify UTF-8 parsing further.
Also, another API you haven't mentioned is Data.Text.Encoding.decodeUtf8Chunk. What's your opinion of that for your problem?
decodeUtf8does not follow the maximal subpart rule ... sounds like a bug, right?
Yes, it should probably have its own issue. Can be replicated through the tests here.
You don't have to unroll four nested pattern-matches to use that API efficiently.
If you wish to respect maximal subpart rule, an error encountered on the first byte results in byte consumption and an error on any successive byte does not. As such you need to track byte boundaries, that's four repeats with the array lookup algorithm. The full unroll is just as deep on the 4-byte branch, and every other branch is shallower than that.
I disagree that in comparison "Data.Text.Internal.Encoding.Utf8 (...) has a rather complex interface."
I admit my phrasing on this point is incorrect, if anything the fact that it's in an Internal module is a much better reason to not use it.
I'm fine with it existing in a separate module with proper documentation, as it may indeed be useful for some highly specific parsers, but so far the benchmarks I linked above show it's not even performance-critical in this library.
another API you haven't mentioned is
Data.Text.Encoding.decodeUtf8Chunk
I have, it's the third bullet point of this issue. A JSON parser needs to treat " as end-of-parse and \ as its own small subparser, so anything beyond a byte-level decoder doesn't fit the purpose.
Having conceded that performance is not an issue, the only remaining difference with Data.Text.Internal.Encoding.Utf8 I see is that your API lets you not have any parser state (except the offset) in between code points. Am I missing anything else?
The trade-off is that you have to write a big tree of nested cases to effectively use that API, since every byte within a codepoint results in a different type of state. Those 40 lines of code correspond to these 7 lines of code in the text library. So even purely in terms of aesthetics ("the properly decoupled Haskell view of things.") it's a hard sell.
The proposed API actually makes things more coupled than Data.Text.Internal.Encoding.Utf8 because it exposes too many details of UTF-8 in the types.
In that case, would making Data.Text.Internal.Encoding.Utf8 not internal resolve this?
lets you not have any parser state (except the offset) in between code points
I don't think "lets" is a correct term here, you can weave any state you want into it, it's just a datatype.
in terms of aesthetics ("the properly decoupled Haskell view of things.") it's a hard sell / exposes too much UTF-8 in the types
The entire point is that it exposes all the innerworkings while abstracting all the hard parts. "Decoupled" doesn't mean "short" or "convenient", it just means you get the power to write whatever you want with no frills. It's obviously a low-level module, so people using it will be ready to spend five extra minutes and duplicate 30 lines.
making
Data.Text.Internal.Encoding.Utf8not internal resolve this
It would definitely help with other people using it, but at this point I would rather carry around a 170 line module that does it in a much more streamlined fashion with predictable performance.
This applies to the StrictBuilder as well (I call it Copy on my side). The exposed API can be used to do what's advertised, but it's not exposed properly or documented succinctly enough to be useful.
For the record you don't need a strong reason to deny this proposal, a simple "we don't have people for this, sorry" is enough. The reason I'm pushing for it is because I already have two different places I want to use it in and I don't want to toss it onto my personal pile of "should be on Hackage, but uploading is a nuisance, I haven't tested it enough and the previous maintainer is nowhere to be seen" projects.
I'm just making sure that I'm not completely missing the point of your proposal. Beyond that, we'll indeed have to agree to disagree, unless another @haskell/text maintainer wants to chime in.
- Pretty much all UTF-8 decoding is done using
simdutf, so on every chunk border the arrays have to be pulled back from the ether just to do 1-4 lookups;
There are three engines for UTF-8 validation in text:
- If you can afford linking against C++,
simdutfis used for bulk processing, and the naive engine kicks in only at the boundaries of chunks. Somewhat frustratingly, if you get precompiledtextdirectly from GHC bindist, most likelysimdutfflag is disabled (because of linking issues). - Otherwise if
bytestring >= 0.11.5(which is fairly new), we use UTF-8 engine from there (written in C) and again invoke the naive engine only at the boundaries. - Otherwise we use the naive engine full time.
I'm not sure what the supposed story for JavaScript backend: it might happen that it's better to use the naive engine instead of compiling C decoder from bytestring into JS.
If you want to benchmark Haskell native implementations, pass cabal build --constraint 'text -simdutf' --constraint 'bytestring < 0.11.5'.
There are ways to embellish Data.Text.Internal.Encoding.Utf8, e. g., expose byteToClass, provide descriptive pattern synonyms for ByteClass, and add something like explain :: DecoderState -> ByteClass -> String, which produces an explanation what exactly went wrong. I am however reluctant to replace the mechanism entirely or add one more UTF-8 decoding engine.
I agree that a more fine-grained error reporting has its use cases, but I fell that it's better to iterate on it outside of text, in a separate package. Bear in mind, it is very difficult to change something in a boot library, and not easy to allow users to upgrade, so it's better to evolve API elsewhere.
pass
cabal build --constraint 'text -simdutf' --constraint 'bytestring < 0.11.5'
While I was missing the fact that bytestring needs to have a specific version, neither contraints, nor cabal.project modifications, nor even specifying a bytestring boundary directly in the cabal file change anything. Even source-repository-package over a git clone doesn't apply, so I'm out of relatively sane options here.
There are three engines for UTF-8 validation
The performance concern applies specifically to the sidecase of using simdutf/bytestring C validator, since crossing chunk borders with continuations still uses the array lookup algorithm. This is something I have tested and it's slower even than naive comparisons (mind you my algorithm is actually very slow on this sidecase too, since I force it to allocate the data structure).
a more fine-grained error reporting
My original point was that I wanted to share error handling with text for consistency, but now I know that OnDecodeError effectively returns a constant String and an entirely ambiguous Word8. As such this point is moot.
Right now the proposal grinds down to the following points:
-
Both the current array lookup algorithm and zero-cost datatype versions could be moved into a separate package or a set of non-internal modules within
text, which would also allow the removal ofData.Text.Internal.Encoding.Utf*modules; -
OnDecodeErrorandUnicodeExceptiondo not provide any reliable error information and as such may be reduced toMaybe Charand a unit respectively; -
There is a minor performance improvement to be gained from using regular branches instead of array lookup when using
simdutf/C validation code; -
It may be a good idea to move
StrictBuilderout of internals as well.
As the main point of this issue is adding algorithms that are not immediately needed within the library and cannot be abstracted into a separate boot library for management reasons, this issue is indeed dead in the water. If noone else has any strong opinions on this topic, I will close the issue at the end of this week.
Even
source-repository-packageover agit clonedoesn't apply, so I'm out of relatively sane options here.
That's extremely strange, could you share a reproducer? Might be worse to raise a bug against Cabal.
The performance concern applies specifically to the sidecase of using
simdutf/bytestringC validator,
My point above was that there are situations when the naive decoder is the only available, and its performance matters. If one wants to make a statements about performance, this case should be measured by disabling simdutf / bytestring.
the current array lookup algorithm ... could be moved into ... a set of non-internal modules within
text
Makes sense to me.
OnDecodeErrorandUnicodeExceptiondo not provide any reliable error information and as such may be reduced toMaybe Charand a unit respectively;
That's largely true. Unfortunately, it's very difficult to iterate on a better interface without repeatedly breaking clients. There is not much demand although: usually clients treat pretty much any UTF-8 decoding error is just "hey, this data is not UTF-8", and precise offence reason matters less. I appreciate that JSON decoding is somewhat less forgiving.
Anyways thanks for your efforts and interest!
Okay, I was able to run the benchmarks without SIMD by git cloneing the package, renaming it and adding it to the packages section of the cabal.project, then adding PackageImports clarifications everywhere.
The results are surprisingly bad for the array lookup algorithm.
| Variant | Benchmark | |||||||
|---|---|---|---|---|---|---|---|---|
| Correct | Early errors | Late errors | Garbage | |||||
| 32KiB | 2MiB | 32KiB | 2MiB | 32KiB | 2MiB | 32KiB | 2MiB | |
| Hoehrmann (SIMD) | 13.69 μs | 1.183 ms | 22.45 μs | 1.790 ms | 163.8 μs | 12.04 ms | 7.104 ms | 459.3 ms |
| Lazy (SIMD) | 10.94 μs | 888.5 μs | 17.38 μs | 1.255 ms | 103.5 μs | 7.962 ms | 3.435 ms | 221.0 ms |
| Hoehrmann | 162.3 μs | 12.24 ms | 163.1 μs | 11.68 ms | 167.8 μs | 12.52 ms | 917.7 μs | 58.82 ms |
| Lazy | 93.24 μs | 7.756 ms | 119.0 μs | 8.576 ms | 121.6 μs | 8.614 ms | 611.7 μs | 41.46 ms |
I'm going to need someone to replicate this on their end and to check my findings for correctness, of course.
For the sake of benchmark reproducibility I incorporated the changes in a fork.
I have inlined everything best I could, the only thing I did not touch is Data.Text.Internal.StrictBuilder (appendR zero-length checks may be the cause for the SIMD performance losses seen previously).
The following list includes every single library benchmark that matches a pattern of $0 ~ /ecode/ .
73620de -- HEAD ebb70b1 -- Naive algorithm (with 73620de as baseline)
17bb010 -- HEAD without SIMD validation 67dea22 -- Naive algorithm without SIMD validation (with 17bb010 as baseline)
| Test case | 73620de | ebb70b1 | 17bb010 | 67dea22 | ||
|---|---|---|---|---|---|---|
| DecodeUtf8.html.Strict | 69.9 μs | 69.2 μs | 1.41 ms | 252 μs | −82% | |
| DecodeUtf8.html.Stream | 70.1 μs | 68.4 μs | 823 μs | 264 μs | −67% | |
| DecodeUtf8.html.StrictLength | 111 μs | 111 μs | 1.45 ms | 291 μs | −80% | |
| DecodeUtf8.html.StrictInitLength | 112 μs | 109 μs | 1.45 ms | 291 μs | −79% | |
| DecodeUtf8.html.Lazy | 67.5 μs | 68.6 μs | 820 μs | 262 μs | −67% | |
| DecodeUtf8.html.LazyLength | 112 μs | 111 μs | 857 μs | 334 μs | −60% | |
| DecodeUtf8.html.LazyInitLength | 111 μs | 110 μs | 857 μs | 301 μs | −64% | |
| DecodeUtf8.xml.Strict | 11.3 ms | 11.3 ms | 245 ms | 71.7 ms | −70% | |
| DecodeUtf8.xml.Stream | 15.1 ms | 14.9 ms | 174 ms | 78.2 ms | −55% | |
| DecodeUtf8.xml.StrictLength | 19.2 ms | 18.7 ms | 252 ms | 79.4 ms | −68% | |
| DecodeUtf8.xml.StrictInitLength | 19.3 ms | 19.1 ms | 251 ms | 79.4 ms | −68% | |
| DecodeUtf8.xml.Lazy | 13.4 ms | 13.3 ms | 170 ms | 76.7 ms | −54% | |
| DecodeUtf8.xml.LazyLength | 19.8 ms | 19.6 ms | 176 ms | 83.4 ms | −52% | |
| DecodeUtf8.xml.LazyInitLength | 19.7 ms | 19.6 ms | 175 ms | 83.1 ms | −52% | |
| DecodeUtf8.ascii.Strict | 7.52 ms | 7.46 ms | 254 ms | 35.5 ms | −86% | |
| DecodeUtf8.ascii.Stream | 11.3 ms | 11.1 ms | 162 ms | 39.2 ms | −75% | |
| DecodeUtf8.ascii.StrictLength | 17.2 ms | 16.1 ms | 264 ms | 44.5 ms | −83% | |
| DecodeUtf8.ascii.StrictInitLength | 15.8 ms | 15.6 ms | 263 ms | 44.4 ms | −83% | |
| DecodeUtf8.ascii.Lazy | 12.4 ms | 12.4 ms | 161 ms | 36.7 ms | −77% | |
| DecodeUtf8.ascii.LazyLength | 19.1 ms | 18.7 ms | 170 ms | 44.8 ms | −73% | |
| DecodeUtf8.ascii.LazyInitLength | 18.9 ms | 18.6 ms | 168 ms | 44.2 ms | −73% | |
| DecodeUtf8.russian.Strict | 1.17 ms | 1.17 ms | 25.5 ms | 8.36 ms | −67% | |
| DecodeUtf8.russian.Stream | 1.37 ms | 1.37 ms | 16.4 ms | 8.58 ms | −47% | |
| DecodeUtf8.russian.StrictLength | 1.88 ms | 1.89 ms | 26.0 ms | 9.75 ms | −62% | |
| DecodeUtf8.russian.StrictInitLength | 1.88 ms | 1.89 ms | 26.0 ms | 9.28 ms | −64% | |
| DecodeUtf8.russian.Lazy | 1.37 ms | 1.37 ms | 16.5 ms | 8.57 ms | −48% | |
| DecodeUtf8.russian.LazyLength | 2.05 ms | 2.03 ms | 17.1 ms | 9.24 ms | −46% | |
| DecodeUtf8.russian.LazyInitLength | 2.03 ms | 2.04 ms | 16.8 ms | 9.24 ms | −45% | |
| DecodeUtf8.japanese.Strict | 3.61 μs | 3.67 μs | 59.0 μs | 14.5 μs | −75% | |
| DecodeUtf8.japanese.Stream | 3.63 μs | 3.72 μs | 31.5 μs | 14.5 μs | −53% | |
| DecodeUtf8.japanese.StrictLength | 5.34 μs | 5.40 μs | 60.9 μs | 16.4 μs | −73% | |
| DecodeUtf8.japanese.StrictInitLength | 5.32 μs | 5.40 μs | 60.3 μs | 16.1 μs | −73% | |
| DecodeUtf8.japanese.Lazy | 3.63 μs | 3.62 μs | 31.5 μs | 14.5 μs | −53% | |
| DecodeUtf8.japanese.LazyLength | 5.44 μs | 5.42 μs | 33.2 μs | 16.3 μs | −50% | |
| DecodeUtf8.japanese.LazyInitLength | 5.46 μs | 5.43 μs | 33.4 μs | 16.1 μs | −51% | |
| DecodeUtf8.ascii.strict decodeUtf8 | 7.66 ms | 7.41 ms | 256 ms | 35.4 ms | −86% | |
| DecodeUtf8.ascii.strict decodeLatin1 | 8.12 ms | 8.02 ms | 8.03 ms | 8.06 ms | ||
| DecodeUtf8.ascii.strict decodeASCII | 8.06 ms | 8.05 ms | 9.17 ms | 8.06 ms | −12% | |
| DecodeUtf8.ascii.lazy decodeUtf8 | 11.4 ms | 11.0 ms | −3% | 168 ms | 37.3 ms | −77% |
| DecodeUtf8.ascii.lazy decodeLatin1 | 13.1 ms | 13.1 ms | 14.0 ms | 13.0 ms | −7% | |
| DecodeUtf8.ascii.lazy decodeASCII | 11.6 ms | 11.6 ms | 13.0 ms | 11.6 ms | −11% | |
| Pure.tiny.decode.Text | 35.6 ns | 59.7 ns | +67% | 27.7 ns | 50.0 ns | +80% |
| Pure.tiny.decode.LazyText | 116 ns | 87.8 ns | −24% | 133 ns | 75.5 ns | −43% |
| Pure.tiny.decode'.Text | 47.9 ns | 74.4 ns | +55% | 45.4 ns | 62.7 ns | +38% |
| Pure.tiny.decode'.LazyText | 150 ns | 115 ns | −23% | 159 ns | 109 ns | −31% |
| Pure.tiny.length.decode.Text | 45.5 ns | 73.7 ns | +61% | 38.5 ns | 64.7 ns | +68% |
| Pure.tiny.length.decode.LazyText | 130 ns | 90.5 ns | −30% | 146 ns | 89.7 ns | −38% |
| Pure.ascii−small.decode.Text | 9.48 μs | 9.57 μs | 311 μs | 46.4 μs | −85% | |
| Pure.ascii−small.decode.LazyText | 11.6 μs | 11.4 μs | 237 μs | 46.7 μs | −80% | |
| Pure.ascii−small.decode'.Text | 9.58 μs | 9.54 μs | 310 μs | 45.4 μs | −85% | |
| Pure.ascii−small.decode'.LazyText | 11.6 μs | 11.2 μs | 237 μs | 47.0 μs | −80% | |
| Pure.ascii−small.length.decode.Text | 18.9 μs | 18.9 μs | 318 μs | 55.3 μs | −82% | |
| Pure.ascii−small.length.decode.LazyText | 20.4 μs | 20.2 μs | 244 μs | 56.9 μs | −76% | |
| Pure.ascii.decode.Text | 7.45 ms | 7.42 ms | 258 ms | 35.5 ms | −86% | |
| Pure.ascii.decode.LazyText | 20.8 ms | 20.5 ms | 205 ms | 46.2 ms | −77% | |
| Pure.ascii.decode'.Text | 7.40 ms | 7.41 ms | 254 ms | 35.6 ms | −86% | |
| Pure.ascii.decode'.LazyText | 20.6 ms | 20.5 ms | 205 ms | 37.2 ms | −81% | |
| Pure.ascii.length.decode.Text | 15.6 ms | 15.5 ms | 264 ms | 44.6 ms | −83% | |
| Pure.ascii.length.decode.LazyText | 19.9 ms | 19.5 ms | 213 ms | 44.7 ms | −78% | |
| Pure.english.decode.Text | 245 μs | 201 μs | −17% | 17.8 ms | 2.30 ms | −87% |
| Pure.english.decode.LazyText | 807 μs | 800 μs | 14.4 ms | 2.55 ms | −82% | |
| Pure.english.decode'.Text | 242 μs | 201 μs | −16% | 17.3 ms | 2.30 ms | −86% |
| Pure.english.decode'.LazyText | 817 μs | 801 μs | 14.2 ms | 2.53 ms | −82% | |
| Pure.english.length.decode.Text | 916 μs | 911 μs | 18.0 ms | 2.91 ms | −83% | |
| Pure.english.length.decode.LazyText | 1.35 ms | 1.35 ms | 14.1 ms | 3.01 ms | −78% | |
| Pure.russian.decode.Text | 3.30 μs | 3.35 μs | 59.5 μs | 20.4 μs | −65% | |
| Pure.russian.decode.LazyText | 3.39 μs | 3.37 μs | 41.7 μs | 20.4 μs | −51% | |
| Pure.russian.decode'.Text | 3.31 μs | 3.37 μs | 60.1 μs | 20.4 μs | −66% | |
| Pure.russian.decode'.LazyText | 3.45 μs | 3.42 μs | 41.8 μs | 20.4 μs | −51% | |
| Pure.russian.length.decode.Text | 4.90 μs | 4.97 μs | 61.6 μs | 21.9 μs | −64% | |
| Pure.russian.length.decode.LazyText | 5.03 μs | 5.00 μs | 43.2 μs | 22.0 μs | −49% | |
| Pure.japanese.decode.Text | 3.53 μs | 3.58 μs | 59.0 μs | 14.5 μs | −75% | |
| Pure.japanese.decode.LazyText | 3.73 μs | 3.71 μs | 34.4 μs | 14.2 μs | −58% | |
| Pure.japanese.decode'.Text | 3.64 μs | 3.69 μs | 59.0 μs | 14.5 μs | −75% | |
| Pure.japanese.decode'.LazyText | 3.77 μs | 3.74 μs | 34.4 μs | 14.6 μs | −57% | |
| Pure.japanese.length.decode.Text | 5.32 μs | 5.41 μs | 60.8 μs | 16.2 μs | −73% | |
| Pure.japanese.length.decode.LazyText | 5.49 μs | 5.44 μs | 36.1 μs | 16.2 μs | −55% | |
Thanks for benchmarking @BurningWitness. Sorry, I'm extra busy this week, will take a look later.
@BurningWitness sorry again, I didn't forget about your work here, but still no time to dive in properly.