This PR addresses #187 by replacing the loop-based implementation of bit_ceil with a constant-time version.
Let me know if everything looks okay.