bitarray surpport of vectorized count

Thanks for the great contribution to the efficient library.

I was wondering is it is possible to do some operation like numpy's bitwise_and(x, Y)

where the function allows a vector x to calculate bit_wise operation to multiple vectors (stored as rows in Y) simultaneously.

What I observe is that numpy scales up very well, but it is actually 10x slower when Y is only a single vector.

May 07 '23 22:05 Tenglon

Thanks for your interest in bitarray! Unlike numpy arrays, bitarrays are always 1 dimensional arrays without broadcasting (which is what makes the numpy example you mention work). I assume you a bitarray x and a bitmatrix Y and for each row in Y you want to count of x & (row in Y). So the output is a N vector of integers (where N is the number of rows in Y). Right? How do you store the bitmatrix Y (N rows and M columns)? As a single bitarray? If this is the case what you want is something like:

[count_and(x, y) for y in Y[i*M:(i+1)*M] for i in range(N)]

If this operation is too slow, you can calculate elements (in the result vector/list) in parallel using threads.

May 08 '23 04:05 ilanschnell

Hi Ilan,

Thanks for the fast reply,

First to clarity my expected output, out = x & (row in Y) would be a Array that has same size as Y, namely N rows and M columns, where each row is a bit vector and each column is a bit.

I did some experimental test to show the speed difference:

Here is my experiment timing for 1000 repeats on M=8192, N = 500:

// x, y are numpy M bit vectors, Y is numpy bit matrix with N rows (of M bits) vector. // a, b are numpy M bit vectors converted to list then to bitarray object. B is a list of N elements, where each element is a M bits bitarray. Bvec is a bit array of N*M bits.

numpy.bitwise_and(x, y)

--- numpy bitwise 0.004984617233276367 seconds ---

a&b

--- bitarray bitwise and 0.0003001689910888672 seconds ---

numpy.bitwise_and(x, Y)

--- numpy matrix bitwise and 0.0505518913269043 seconds ---

for i in range(len(B)):
    a&B[i]

--- bitarray matrix by loop bitwise and 0.3004026412963867 seconds ---

[a & Bvec[j * N_bits: (j + 1) * N_bits] for j in range(len(B))]

--- bitarray matrix comprehension and 0.4826831817626953 seconds ---

A turnover from 10 times faster to 10 times slower. which confused me a lot.

I guess numpy uses some SIMD features that the processor supports. But I did not verify this in the source code, as they are not python coded and has complex reliance on C/C++ sources, but I got the above speed comparison for you information.

May 08 '23 05:05 Tenglon

surpport of vectorized count_and()?