Revisit efficiency of `Integral` conversions in `stimes` implementation for `StrictByteString`
The goal is to avoid allocating intermediate Integers when using stimes @Int, @Word or with other primitive types.
There's some ongoing work in GHC that should help us: https://gitlab.haskell.org/ghc/ghc/-/issues/20361. Once it is merged, we should check whether the goal has been achieved.
It may be useful to replace checkedIntegerToInt with Data.Bits.toIntegralSized then.
https://github.com/haskell/bytestring/blob/12563781f257f132f4da866047abe720623f9990/Data/ByteString/Internal.hs#L737-L787
https://github.com/haskell/bytestring/blob/12563781f257f132f4da866047abe720623f9990/Data/ByteString/Internal.hs#L868-L879
Thank you. I think the part of the GHC Integer simplification changes that we need to avoid intermediate Integers in stimes @StrictByteString @Int without any rewrite rules was already merged via this upstream MR. It doesn't look like Word is quite so lucky just yet.
As expected, with Int, the Core GHC-9.4.1-alpha1 generates looks great.
-- RHS size: {terms: 20, types: 9, coercions: 0, joins: 0/0}
testSpecInt :: Int -> S.ByteString -> S.ByteString
[GblId,
Arity=2,
Str=<1!P(SL)><1!P(L,L,L)>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=2,unsat_ok=True,boring_ok=False)
Tmpl= \ (nRaw_i21nd [Occ=Once1!] :: Int)
(bs_i21nf [Occ=Once1!] :: S.ByteString) ->
case bs_i21nf of
{ Data.ByteString.Internal.BS ipv_i21nh [Occ=Once1]
ipv1_i21ni [Occ=Once1] ipv2_i21nj [Occ=Once1] ->
case nRaw_i21nd of { GHC.Types.I# v1_B2 ->
case GHC.Prim.>=# v1_B2 0# of {
__DEFAULT -> Data.ByteString.Internal.stimesNegativeErr;
1# ->
Data.ByteString.Internal.$wstimesNonNegativeInt
v1_B2 ipv_i21nh ipv1_i21ni ipv2_i21nj
}
}
}}]
testSpecInt
= \ (nRaw_i21nd :: Int) (bs_i21nf :: S.ByteString) ->
case bs_i21nf of
{ Data.ByteString.Internal.BS ipv_i21nh ipv1_i21ni ipv2_i21nj ->
case nRaw_i21nd of { GHC.Types.I# v1_B2 ->
case GHC.Prim.>=# v1_B2 0# of {
__DEFAULT -> Data.ByteString.Internal.stimesNegativeErr;
1# ->
Data.ByteString.Internal.$wstimesNonNegativeInt
v1_B2 ipv_i21nh ipv1_i21ni ipv2_i21nj
}
}
}
-- RHS size: {terms: 9, types: 5, coercions: 0, joins: 0/0}
testKnownInt :: S.ByteString -> S.ByteString
[GblId,
Arity=1,
Str=<1!P(L,L,L)>,
Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True,
Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False)
Tmpl= \ (bs_i21nf [Occ=Once1!] :: S.ByteString) ->
case bs_i21nf of
{ Data.ByteString.Internal.BS ipv_i21nh [Occ=Once1]
ipv1_i21ni [Occ=Once1] ipv2_i21nj [Occ=Once1] ->
Data.ByteString.Internal.$wstimesNonNegativeInt
123456# ipv_i21nh ipv1_i21ni ipv2_i21nj
}}]
testKnownInt
= \ (bs_i21nf :: S.ByteString) ->
case bs_i21nf of
{ Data.ByteString.Internal.BS ipv_i21nh ipv1_i21ni ipv2_i21nj ->
Data.ByteString.Internal.$wstimesNonNegativeInt
123456# ipv_i21nh ipv1_i21ni ipv2_i21nj
}
For other types, things are not so great. Using Word without eta-expanding looks particularly ugly:
-- RHS size: {terms: 73, types: 27, coercions: 0, joins: 1/3}
testSpecWord :: Word -> S.ByteString -> S.ByteString
[GblId,
Arity=1,
Str=<ML>,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [20] 339 60}]
testSpecWord
= \ (nRaw_i21nd :: Word) ->
let {
n_s21nP :: Integer
[LclId]
n_s21nP
= case nRaw_i21nd of { GHC.Types.W# x#_a4Jg ->
GHC.Num.Integer.integerFromWord# x#_a4Jg
} } in
let {
lvl_s21o5 :: Int
[LclId]
lvl_s21o5
= case GHC.Num.Integer.integerToInt# n_s21nP of v_B2 { __DEFAULT ->
GHC.Types.I# v_B2
} } in
\ (bs_i21nf :: S.ByteString) ->
case bs_i21nf of
{ Data.ByteString.Internal.BS ipv_i21nh ipv1_i21ni ipv2_i21nj ->
case lvl_s21o5 of { GHC.Types.I# v1_B2 ->
join {
$j_s21nO [Dmd=ML] :: S.ByteString
[LclId[JoinId(0)(Nothing)]]
$j_s21nO
= case n_s21nP of {
GHC.Num.Integer.IS x1_i21nq ->
case GHC.Prim.<# x1_i21nq 0# of {
__DEFAULT ->
case ipv2_i21nj of {
__DEFAULT -> Data.ByteString.Internal.stimesOverflowErr;
0# -> S.empty
};
1# -> Data.ByteString.Internal.stimesNegativeErr
};
GHC.Num.Integer.IP x1_i21nt ->
case ipv2_i21nj of {
__DEFAULT -> Data.ByteString.Internal.stimesOverflowErr;
0# -> S.empty
};
GHC.Num.Integer.IN x1_i21nH ->
Data.ByteString.Internal.stimesNegativeErr
} } in
case n_s21nP of {
GHC.Num.Integer.IS x1_i21nw ->
case GHC.Prim.==# x1_i21nw v1_B2 of {
__DEFAULT -> jump $j_s21nO;
1# ->
case GHC.Prim.>=# v1_B2 0# of {
__DEFAULT -> Data.ByteString.Internal.stimesNegativeErr;
1# ->
Data.ByteString.Internal.$wstimesNonNegativeInt
v1_B2 ipv_i21nh ipv1_i21ni ipv2_i21nj
}
};
GHC.Num.Integer.IP x1_i21nC -> jump $j_s21nO;
GHC.Num.Integer.IN x1_i21nE -> jump $j_s21nO
}
}
}
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
StimesSpecTest.testKnownWord1 :: Word
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 10}]
StimesSpecTest.testKnownWord1 = GHC.Types.W# 456789##
-- RHS size: {terms: 2, types: 0, coercions: 0, joins: 0/0}
testKnownWord :: S.ByteString -> S.ByteString
[GblId,
Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 20 0}]
testKnownWord = testSpecWord StimesSpecTest.testKnownWord1
There are several problems:
- The function actually has arity one, because it tries to share the intermediate
Integerbetween calls. I don't think I noticed this when reducing the syntactic arity ofstimesPolymorphicwhile working on #443, which is my bad. Although this can reduce allocations for something likemap (stimes k) li, I think the trade-off is clearly unfavorable here:- When the function is called with arity two, allocations are actually increased because the closure has to be allocated.
- This effectively prevents the Worker/Wrapper transformation from being applicable.
- If the conversion to
Integeractually is worth sharing for some use-case, a user can easily do so themselves. (Or, even better, convert toInt.)
- GHC generates
INbranches even thoughintegerFromWord#cannot return any such value. - The
Int#equality check in theISbranch is not optimized away, probably becauseintegerToInt#is still opaque. - The join point
$j_s21nOshould really be inlined at its use sites. - The function body is large enough that it isn't inlined into
testKnownWord, which prevents constant-folding from reducing the latter to something more liketestKnownInt.
The first bullet should be fixed. Most of the other bullets are relatively easy for GHC to improve at, but would require ugly workarounds at the library level, so I think they warrant no action here.
Thanks for investigating, @clyring! :)
I don't fully understand the tradeoffs of the arity issue, but I'd welcome a PR!