zig icon indicating copy to clipboard operation
zig copied to clipboard

std.io.BitReader: add a countBits method to find runs of 1 or 0 bits

Open bens opened this issue 2 years ago • 3 comments

The bit_buffer was also changed to hold 8 bits rather than 7, which is necessary for when you use countBits(0) on .{0xFF}, for example. There are no bits processed in the byte yet so all 8 bits have to be retained for the next read. bit_count can stay as a u3 but now represents the number of bits still unprocessed, minus 1, since if there are no unprocessed bytes we can set bit_buffer to null.

bens avatar Dec 01 '23 08:12 bens

I wanted to add the countBits feature because I'm implementing Golomb coding which uses a unary encoded prefix. A simple, fast way of counting consecutive zero or one bits seemed the best way to implement decoding that. Using readBits to get one bit at a time would work, but with builtins like @ctz available it seemed a shame to not use them.

I believe some of the Elias codes use unary encoded prefixes as well, and likely other variable length codes as well, so I imagine in data compression at least it will be useful for other people.

bens avatar Dec 01 '23 08:12 bens

Using readBits to get one bit at a time would work, but with builtins like @ctz available it seemed a shame to not use them.

Actually, readBits isn't really enough since if you're counting 1s and you reach a 0, you won't get the 0 as part of the next readBits. You need to push that back somewhere and if the buffer is already full with 7 bits, there's again nowhere to put it unless you change to an 8 bit buffer.

bens avatar Dec 11 '23 04:12 bens

One other thing is that the idiom in the unit tests of setting the bit_count field to 0 instead of calling alignToByte() won't work any more, which is a silent breakage. This might be enough reason to make bit_count a u4 after all. I can rework it that way if that's the better idea.

bens avatar Jan 28 '24 13:01 bens