succinct icon indicating copy to clipboard operation
succinct copied to clipboard

optimize select_in_word

Open Validark opened this issue 2 years ago • 0 comments

Today I decided to reinvent this algorithm. Then I remembered this project, and decided to see if this repository had a better idea.

Turns out, my version had a few slight improvements over the implementation here.

  1. We can do (k | 0x80) * ones_step_8 rather than (k * ones_step_8) | msbs_step_8. This is helpful because we want to avoid loading msbs_step_8 when not using the popcount implementation.
  2. We do not need & ~uint64_t(0x7). This is easy to prove, since the maximum value it can operate on is 00001000_00000111_00000110_00000101_00000100_00000011_00000010_00000001. Since the byte under the most significant byte can be a 7 at most, the upper 3 bits of that second byte are always 0. Therefore we do not need to zero them out.
  3. Rather than doing (x & msbs_step_8) >> 7 we can do (x >> 7) & ones_step_8, which means we do not have to load msbs_step_8.

Validark avatar Jan 03 '24 06:01 Validark