Add high order function for checking equality after transformation

Open eborden opened this issue 7 years ago • 1 comments

It is very common to transform two Text values and then compare them for equality.

toCaseFold x == toCaseFold y

This operation ends up being costly because it must materialize each newly transformed Text in order to check their equality. Instead we can provide a helper for such patterns transformEq, which relies on the underlying ustream to check equality and avoids materializing the transformed Text in the average case.

Benchmark results for this operation show a worst case that is close to the naive comparison and a best case which is much more agreeable.

benchmarking transformEq/Text ==: Eq
time                 629.6 ns   (625.7 ns .. 634.5 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 630.0 ns   (626.6 ns .. 634.2 ns)
std dev              12.93 ns   (9.856 ns .. 16.21 ns)
variance introduced by outliers: 25% (moderately inflated)

benchmarking transformEq/Text transformEq: Eq
time                 578.4 ns   (575.6 ns .. 581.2 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 577.9 ns   (575.1 ns .. 581.3 ns)
std dev              10.35 ns   (8.092 ns .. 13.02 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking transformEq/Text ==: Not Eq
time                 663.1 ns   (659.3 ns .. 666.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 660.4 ns   (657.9 ns .. 664.4 ns)
std dev              10.49 ns   (7.178 ns .. 16.44 ns)
variance introduced by outliers: 17% (moderately inflated)

benchmarking transformEq/Text transformEq: Not Eq
time                 126.9 ns   (126.1 ns .. 127.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 126.7 ns   (126.2 ns .. 127.4 ns)
std dev              2.122 ns   (1.663 ns .. 2.636 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking transformEq/Text ==: Not Length
time                 499.0 ns   (496.3 ns .. 502.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 499.8 ns   (497.2 ns .. 502.9 ns)
std dev              9.655 ns   (7.545 ns .. 11.84 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking transformEq/Text transformEq: Not Length
time                 17.53 ns   (17.45 ns .. 17.62 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 17.65 ns   (17.55 ns .. 17.78 ns)
std dev              359.4 ps   (283.1 ps .. 500.4 ps)
variance introduced by outliers: 31% (moderately inflated)

Oct 24 '18 18:10 eborden

Note, we've implemented this function in Freckle's codebase on a newtyped Text type (NameComponent) and saw a large performance improvement in a codepath that was utilizing this pattern heavily.

Relevant profiling details:

Before

name              location                                             %time %alloc
caseConvert.next  Data/Text/Internal/Fusion/Common.hs:(405,5)-(410,45)  13.4   28.8
onNameComponent   library/FrontRow/Types/NameComponent.hs:38:1-55        7.4    9.8

After

name               location                                               %time %alloc
caseInsensitiveEq  library/FrontRow/Types/NameComponent.hs:(54,1)-(58,78)   3.2    5.0

Oct 24 '18 18:10 eborden