succinct
succinct copied to clipboard
optimize select_in_word
Today I decided to reinvent this algorithm. Then I remembered this project, and decided to see if this repository had a better idea.
Turns out, my version had a few slight improvements over the implementation here.
- We can do
(k | 0x80) * ones_step_8rather than(k * ones_step_8) | msbs_step_8. This is helpful because we want to avoid loadingmsbs_step_8when not using the popcount implementation. - We do not need
& ~uint64_t(0x7). This is easy to prove, since the maximum value it can operate on is00001000_00000111_00000110_00000101_00000100_00000011_00000010_00000001. Since the byte under the most significant byte can be a 7 at most, the upper 3 bits of that second byte are always 0. Therefore we do not need to zero them out. - Rather than doing
(x & msbs_step_8) >> 7we can do(x >> 7) & ones_step_8, which means we do not have to loadmsbs_step_8.