surpport of vectorized count_and()?
Thanks for the great contribution to the efficient library.
I was wondering is it is possible to do some operation like numpy's bitwise_and(x, Y)
where the function allows a vector x to calculate bit_wise operation to multiple vectors (stored as rows in Y) simultaneously.
What I observe is that numpy scales up very well, but it is actually 10x slower when Y is only a single vector.
Thanks for your interest in bitarray! Unlike numpy arrays, bitarrays are always 1 dimensional arrays without broadcasting (which is what makes the numpy example you mention work). I assume you a bitarray x and a bitmatrix Y and for each row in Y you want to count of x & (row in Y). So the output is a N vector of integers (where N is the number of rows in Y). Right? How do you store the bitmatrix Y (N rows and M columns)? As a single bitarray? If this is the case what you want is something like:
[count_and(x, y) for y in Y[i*M:(i+1)*M] for i in range(N)]
If this operation is too slow, you can calculate elements (in the result vector/list) in parallel using threads.
Hi Ilan,
Thanks for the fast reply,
First to clarity my expected output, out = x & (row in Y) would be a Array that has same size as Y, namely N rows and M columns, where each row is a bit vector and each column is a bit.
I did some experimental test to show the speed difference:
Here is my experiment timing for 1000 repeats on M=8192, N = 500:
// x, y are numpy M bit vectors, Y is numpy bit matrix with N rows (of M bits) vector. // a, b are numpy M bit vectors converted to list then to bitarray object. B is a list of N elements, where each element is a M bits bitarray. Bvec is a bit array of N*M bits.
numpy.bitwise_and(x, y)
--- numpy bitwise 0.004984617233276367 seconds ---
a&b
--- bitarray bitwise and 0.0003001689910888672 seconds ---
numpy.bitwise_and(x, Y)
--- numpy matrix bitwise and 0.0505518913269043 seconds ---
for i in range(len(B)):
a&B[i]
--- bitarray matrix by loop bitwise and 0.3004026412963867 seconds ---
[a & Bvec[j * N_bits: (j + 1) * N_bits] for j in range(len(B))]
--- bitarray matrix comprehension and 0.4826831817626953 seconds ---
A turnover from 10 times faster to 10 times slower. which confused me a lot.
I guess numpy uses some SIMD features that the processor supports. But I did not verify this in the source code, as they are not python coded and has complex reliance on C/C++ sources, but I got the above speed comparison for you information.