bitshuffle
bitshuffle copied to clipboard
Decompression slow downs for "too many" threads
It seems the openmp locks and (dynamic,1) overhead can become important for machines with large numbers of cores. For decompression, I could see some improvements using static scheduling:

Perhaps there is a better way to overcome this problem? Anyway, I will try to send you a pull request for dicussion.