unboxed Vector Bool seems to be 40 times slower than unboxed Array Int Bool
unboxed Vector Bool seems to be 8 times slower than unboxed Array Int Bool. As far as I can tell it is due to the performance difference in accessing array/vector elements. The following is with ghc 9.8.2 on an Intel Mac but I get about the same results on an M2 MacBook with ghc 9.10.1
diff prob171a.hs prob171v.hs
1d0
<
8c7
< import qualified Data.Array.Unboxed as AU
--
> import qualified Data.Vector.Unboxed as VU
11a11
>
13,14c13,14
< psqs :: Int -> AU.Array Int Bool
< psqs n = AU.listArray (0,n) (map (psq . ssd) [0..n])
--
> psqs :: Int -> VU.Vector Bool
> psqs n = VU.fromList (map (psq . ssd) [0..n])
29a30,33
>
>
>
>
36c40
< | bv AU.! ssd i = f (i + 1) (res + i)
--
> | bv VU.! ssd i = f (i + 1) (res + i)
gcolpitts@Georges-Mini haskell % ghc -O prob171a.hs
ghc -O prob171a.hs
Loaded package environment from /Users/gcolpitts/.ghc/x86_64-darwin-9.8.2/environments/default
[1 of 2] Compiling Main ( prob171a.hs, prob171a.o ) [Source file changed]
[2 of 2] Linking prob171a [Objects changed]
ld: warning: ignoring duplicate libraries: '-lm'
ld: warning: search path '/opt/local/lib/' not found
gcolpitts@Georges-Mini haskell % ghc -O prob171v.hs
ghc -O prob171v.hs
Loaded package environment from /Users/gcolpitts/.ghc/x86_64-darwin-9.8.2/environments/default
[1 of 2] Compiling Main ( prob171v.hs, prob171v.o ) [Source file changed]
[2 of 2] Linking prob171v [Objects changed]
ld: warning: ignoring duplicate libraries: '-lm'
ld: warning: search path '/opt/local/lib/' not found
gcolpitts@Georges-Mini haskell % ./prob171a +RTS -s
./prob171a +RTS -s
367437474103403
103,112 bytes allocated in the heap
3,272 bytes copied during GC
44,328 bytes maximum residency (1 sample(s))
25,304 bytes maximum slop
6 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.000s 0.000s 0.0002s 0.0002s
INIT time 0.002s ( 0.002s elapsed)
MUT time 8.442s ( 8.444s elapsed)
GC time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.010s elapsed)
Total time 8.444s ( 8.457s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 12,214 bytes per MUT second
Productivity 100.0% of total user, 99.9% of total elapsed
gcolpitts@Georges-Mini haskell % ./prob171v +RTS -s
./prob171v +RTS -s
367437474103403
97,888 bytes allocated in the heap
3,272 bytes copied during GC
44,328 bytes maximum residency (1 sample(s))
25,304 bytes maximum slop
6 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.000s 0.000s 0.0002s 0.0002s
INIT time 0.003s ( 0.003s elapsed)
MUT time 61.850s ( 62.264s elapsed)
GC time 0.000s ( 0.000s elapsed)
EXIT time 0.000s ( 0.012s elapsed)
Total time 61.854s ( 62.280s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 1,582 bytes per MUT second
Productivity 100.0% of total user, 100.0% of total elapsed
gcolpitts@Georges-Mini haskell %
The reason is that bv is inlined in the vector version, causing a new vector to be allocated each time bv is used. This can be fixed by adding a {-# NOINLINE bv #-} annotation.
Since you're using Vector Bool as bit vectors, take a look at https://hackage.haskell.org/package/bitvec. It provides both faster and more memory efficient bit vectors.
Thanks for the quick and informative response! It certainly explains the performance difference but I'm surprised that it doesn't show up in bytes allocated. In any case I think we're agreed that this is a bug. IMHO and my less than expert knowledge it seems like a serious one. Thanks again!
I verified that, as you said , {-# NOINLINE bv #-} is a workaround for the problem
I don't think this is a bug in vector. It could be a GHC bug though.
Thanks! Any ideas why the bug only happens with Vector and not with Array?
I reported a related bug against ghc: https://gitlab.haskell.org/ghc/ghc/-/issues/25055
Thanks! Any ideas why the bug only happens with Vector and not with Array?
It's probably because vector does more inlining.
Thanks, that makes sense.
I reported a related bug against ghc: https://gitlab.haskell.org/ghc/ghc/-/issues/25055
This ghc bug has been fixed in 9.12.1 but the vector bug reported here still exists in 9.12.1 but now the vector version is 40 times slower than the array version
(Drive-by comment, no idea) @GeorgeCo using criterion would probably be a more scientific measurement, also might help with accidental inlining via its (wh)nf helper.
mistakenly closed, the problem is still there in ghc 9.12.2. I don't think a consistent factor of 40 requires criterion to confirm there is a bug. I welcome any attempts to prove me wrong. The workaround above still works.
(base) gcolpitts@Mac haskell % ghc -O2 prob171a.hs
Loaded package environment from /Users/gcolpitts/.ghc/aarch64-darwin-9.12.2/environments/default
[1 of 2] Compiling Main ( prob171a.hs, prob171a.o ) [Missing object file]
[2 of 2] Linking prob171a [Objects changed]
(base) gcolpitts@Mac haskell % ./prob171a +RTS -s
367437474103403
103,128 bytes allocated in the heap
3,272 bytes copied during GC
44,328 bytes maximum residency (1 sample(s))
25,304 bytes maximum slop
6 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.000s 0.001s 0.0012s 0.0012s
INIT time 0.004s ( 0.004s elapsed)
MUT time 1.902s ( 1.901s elapsed)
GC time 0.000s ( 0.001s elapsed)
EXIT time 0.001s ( 0.011s elapsed)
Total time 1.907s ( 1.918s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 54,230 bytes per MUT second
Productivity 99.7% of total user, 99.1% of total elapsed
(base) gcolpitts@Mac haskell % ghc -O2 prob171v.hs
Loaded package environment from /Users/gcolpitts/.ghc/aarch64-darwin-9.12.2/environments/default
[1 of 2] Compiling Main ( prob171v.hs, prob171v.o ) [Source file changed]
[2 of 2] Linking prob171v [Objects changed]
(base) gcolpitts@Mac haskell % ./prob171v +RTS -s
367437474103403
97,904 bytes allocated in the heap
3,272 bytes copied during GC
44,328 bytes maximum residency (1 sample(s))
25,304 bytes maximum slop
6 MiB total memory in use (0 MiB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.000s 0.000s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.000s 0.001s 0.0011s 0.0011s
INIT time 0.004s ( 0.004s elapsed)
MUT time 85.816s ( 85.830s elapsed)
GC time 0.000s ( 0.001s elapsed)
EXIT time 0.001s ( 0.011s elapsed)
Total time 85.822s ( 85.847s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 1,140 bytes per MUT second
Productivity 100.0% of total user, 100.0% of total elapsed
(base) gcolpitts@Mac haskell %
using bitvec has the same problem since it uses the vector package. I can upload the file if you like
mistakenly closed, the problem is still there in ghc 9.12.2
Reopening then.
@konsumlamm
The reason is that
bvis inlined in thevectorversion, causing a new vector to be allocated each timebvis used. This can be fixed by adding a{-# NOINLINE bv #-}annotation.Since you're using
Vector Boolas bit vectors, take a look at https://hackage.haskell.org/package/bitvec. It provides both faster and more memory efficient bit vectors.
aMybit vector version has the same problem as it uses the vector package. As you would expect, the workaround works for the bit vector version and it is then twice as fast as the array version. Surprisingly the bit vector version allocates more.