vector icon indicating copy to clipboard operation
vector copied to clipboard

unboxed Vector Bool seems to be 40 times slower than unboxed Array Int Bool

Open GeorgeCo opened this issue 1 year ago • 15 comments

unboxed Vector Bool seems to be 8 times slower than unboxed Array Int Bool. As far as I can tell it is due to the performance difference in accessing array/vector elements. The following is with ghc 9.8.2 on an Intel Mac but I get about the same results on an M2 MacBook with ghc 9.10.1

diff prob171a.hs prob171v.hs
1d0
< 
8c7
< import qualified Data.Array.Unboxed as AU
--
> import qualified Data.Vector.Unboxed as VU
11a11
> 
13,14c13,14
< psqs :: Int -> AU.Array Int Bool
< psqs n = AU.listArray (0,n) (map (psq . ssd) [0..n])
--
> psqs :: Int -> VU.Vector Bool
> psqs n = VU.fromList (map (psq . ssd) [0..n])
29a30,33
> 
> 
>              
> 
36c40
<       | bv AU.! ssd i = f (i + 1) (res + i)
--
>       | bv VU.! ssd i = f (i + 1) (res + i)
gcolpitts@Georges-Mini haskell % ghc -O prob171a.hs
ghc -O prob171a.hs
Loaded package environment from /Users/gcolpitts/.ghc/x86_64-darwin-9.8.2/environments/default
[1 of 2] Compiling Main             ( prob171a.hs, prob171a.o ) [Source file changed]
[2 of 2] Linking prob171a [Objects changed]
ld: warning: ignoring duplicate libraries: '-lm'
ld: warning: search path '/opt/local/lib/' not found
gcolpitts@Georges-Mini haskell % ghc -O prob171v.hs
ghc -O prob171v.hs
Loaded package environment from /Users/gcolpitts/.ghc/x86_64-darwin-9.8.2/environments/default
[1 of 2] Compiling Main             ( prob171v.hs, prob171v.o ) [Source file changed]
[2 of 2] Linking prob171v [Objects changed]
ld: warning: ignoring duplicate libraries: '-lm'
ld: warning: search path '/opt/local/lib/' not found
gcolpitts@Georges-Mini haskell % ./prob171a +RTS -s
./prob171a +RTS -s
367437474103403
         103,112 bytes allocated in the heap
           3,272 bytes copied during GC
          44,328 bytes maximum residency (1 sample(s))
          25,304 bytes maximum slop
               6 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0002s    0.0002s

  INIT    time    0.002s  (  0.002s elapsed)
  MUT     time    8.442s  (  8.444s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.010s elapsed)
  Total   time    8.444s  (  8.457s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    12,214 bytes per MUT second

  Productivity 100.0% of total user, 99.9% of total elapsed



gcolpitts@Georges-Mini haskell % ./prob171v +RTS -s
./prob171v +RTS -s
367437474103403
          97,888 bytes allocated in the heap
           3,272 bytes copied during GC
          44,328 bytes maximum residency (1 sample(s))
          25,304 bytes maximum slop
               6 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.000s     0.0002s    0.0002s

  INIT    time    0.003s  (  0.003s elapsed)
  MUT     time   61.850s  ( 62.264s elapsed)
  GC      time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.012s elapsed)
  Total   time   61.854s  ( 62.280s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    1,582 bytes per MUT second

  Productivity 100.0% of total user, 100.0% of total elapsed

gcolpitts@Georges-Mini haskell % 

prob171v.hs.txt prob171a.hs.txt

GeorgeCo avatar Jun 26 '24 20:06 GeorgeCo

The reason is that bv is inlined in the vector version, causing a new vector to be allocated each time bv is used. This can be fixed by adding a {-# NOINLINE bv #-} annotation.

Since you're using Vector Bool as bit vectors, take a look at https://hackage.haskell.org/package/bitvec. It provides both faster and more memory efficient bit vectors.

konsumlamm avatar Jun 27 '24 01:06 konsumlamm

Thanks for the quick and informative response! It certainly explains the performance difference but I'm surprised that it doesn't show up in bytes allocated. In any case I think we're agreed that this is a bug. IMHO and my less than expert knowledge it seems like a serious one. Thanks again!

GeorgeCo avatar Jun 27 '24 13:06 GeorgeCo

I verified that, as you said , {-# NOINLINE bv #-} is a workaround for the problem

GeorgeCo avatar Jun 28 '24 15:06 GeorgeCo

I don't think this is a bug in vector. It could be a GHC bug though.

konsumlamm avatar Jul 02 '24 11:07 konsumlamm

Thanks! Any ideas why the bug only happens with Vector and not with Array?

GeorgeCo avatar Jul 02 '24 11:07 GeorgeCo

I reported a related bug against ghc: https://gitlab.haskell.org/ghc/ghc/-/issues/25055

GeorgeCo avatar Jul 09 '24 16:07 GeorgeCo

Thanks! Any ideas why the bug only happens with Vector and not with Array?

It's probably because vector does more inlining.

konsumlamm avatar Jul 09 '24 18:07 konsumlamm

Thanks, that makes sense.

GeorgeCo avatar Jul 09 '24 18:07 GeorgeCo

I reported a related bug against ghc: https://gitlab.haskell.org/ghc/ghc/-/issues/25055

This ghc bug has been fixed in 9.12.1 but the vector bug reported here still exists in 9.12.1 but now the vector version is 40 times slower than the array version

GeorgeCo avatar Dec 22 '24 15:12 GeorgeCo

(Drive-by comment, no idea) @GeorgeCo using criterion would probably be a more scientific measurement, also might help with accidental inlining via its (wh)nf helper.

robinp avatar Oct 04 '25 20:10 robinp

mistakenly closed, the problem is still there in ghc 9.12.2. I don't think a consistent factor of 40 requires criterion to confirm there is a bug. I welcome any attempts to prove me wrong. The workaround above still works.

(base) gcolpitts@Mac haskell % ghc -O2 prob171a.hs
Loaded package environment from /Users/gcolpitts/.ghc/aarch64-darwin-9.12.2/environments/default
[1 of 2] Compiling Main             ( prob171a.hs, prob171a.o ) [Missing object file]
[2 of 2] Linking prob171a [Objects changed]
(base) gcolpitts@Mac haskell % ./prob171a +RTS -s
367437474103403
         103,128 bytes allocated in the heap
           3,272 bytes copied during GC
          44,328 bytes maximum residency (1 sample(s))
          25,304 bytes maximum slop
               6 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.001s     0.0012s    0.0012s

  INIT    time    0.004s  (  0.004s elapsed)
  MUT     time    1.902s  (  1.901s elapsed)
  GC      time    0.000s  (  0.001s elapsed)
  EXIT    time    0.001s  (  0.011s elapsed)
  Total   time    1.907s  (  1.918s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    54,230 bytes per MUT second

  Productivity  99.7% of total user, 99.1% of total elapsed

(base) gcolpitts@Mac haskell % ghc -O2 prob171v.hs
Loaded package environment from /Users/gcolpitts/.ghc/aarch64-darwin-9.12.2/environments/default
[1 of 2] Compiling Main             ( prob171v.hs, prob171v.o ) [Source file changed]
[2 of 2] Linking prob171v [Objects changed]
(base) gcolpitts@Mac haskell % ./prob171v +RTS -s
367437474103403
          97,904 bytes allocated in the heap
           3,272 bytes copied during GC
          44,328 bytes maximum residency (1 sample(s))
          25,304 bytes maximum slop
               6 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0         0 colls,     0 par    0.000s   0.000s     0.0000s    0.0000s
  Gen  1         1 colls,     0 par    0.000s   0.001s     0.0011s    0.0011s

  INIT    time    0.004s  (  0.004s elapsed)
  MUT     time   85.816s  ( 85.830s elapsed)
  GC      time    0.000s  (  0.001s elapsed)
  EXIT    time    0.001s  (  0.011s elapsed)
  Total   time   85.822s  ( 85.847s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    1,140 bytes per MUT second

  Productivity 100.0% of total user, 100.0% of total elapsed

(base) gcolpitts@Mac haskell % 

GeorgeCo avatar Oct 06 '25 20:10 GeorgeCo

using bitvec has the same problem since it uses the vector package. I can upload the file if you like

GeorgeCo avatar Oct 06 '25 20:10 GeorgeCo

mistakenly closed, the problem is still there in ghc 9.12.2

Reopening then.

Shimuuar avatar Oct 06 '25 21:10 Shimuuar

@konsumlamm

The reason is that bv is inlined in the vector version, causing a new vector to be allocated each time bv is used. This can be fixed by adding a {-# NOINLINE bv #-} annotation.

Since you're using Vector Bool as bit vectors, take a look at https://hackage.haskell.org/package/bitvec. It provides both faster and more memory efficient bit vectors.

aMybit vector version has the same problem as it uses the vector package. As you would expect, the workaround works for the bit vector version and it is then twice as fast as the array version. Surprisingly the bit vector version allocates more.

GeorgeCo avatar Oct 07 '25 15:10 GeorgeCo