snzip icon indicating copy to clipboard operation
snzip copied to clipboard

Invalid CRC for framing modes

Open monoxgas opened this issue 6 years ago • 3 comments

https://github.com/kubo/snzip/blob/809c6f2db74bdd2ccfc34f4dc30adf0338161af3/crc32.h#L15

It would appear that while calculating the masked CRC in the function above, you perform a bitwise NOT operator on the result before masking. I'm unsure what formats require this, if any (possibly snappy-in-java), but this creates a mismatched CRC for most other public implementations of framing2. Specifically:

  • Firefox's IndexedDB implementation : https://github.com/mozilla/gecko-dev/blob/265e6721798a455604328ed5262f430cfcc37c2f/xpcom/io/SnappyFrameUtils.cpp#L119
  • The python-snappy library : https://github.com/andrix/python-snappy/blob/3c9cad5e82c48416c930e70ae21cc7a0ee693a6f/snappy/snappy.py#L67

I was hesitant to make a PR until I verified which formats might require it, but it should be an easy fix (I'd be happy to do it). Possibly two different functions, one which inverts and one which does not.

monoxgas avatar Oct 15 '19 19:10 monoxgas

Also, for quick reference, the CRC-32 section of the current framing2 spec: https://github.com/google/snappy/blob/e9e11b84e629c3e06fbaa4f0a86de02ceb9d6992/framing_format.txt#L39

monoxgas avatar Oct 15 '19 19:10 monoxgas

The plot thickens.

Upon further inspection, I believe the implementation here is indeed correct and it's the Mozilla repo which is at fault. I also erroneously claimed that python-snappy disagrees with the CRC checksums currently being computed, however they do indeed match.

Specifically, the formal spec of CRC-32/ISCSI (CRC-32C) states that the initialization value should be 0xffffffff/~0 and that the final value should be complemented : x ^ 0xffffffff or x = ~x. This is exactly what is happening here:

https://github.com/kubo/snzip/blob/809c6f2db74bdd2ccfc34f4dc30adf0338161af3/crc32.h#L15

References:

  • https://tools.ietf.org/html/rfc3720#section-12.1

  • https://github.com/madler/brotli/blob/501e6a9d03bcc15f0bc8015f4f36054c30f699ca/crc32c.c#L407

  • http://reveng.sourceforge.net/crc-catalogue/all.htm

    ... init=0xffffffff ... xorout=0xffffffff

However this still creates a small pickle, as there is now technically another "flavour" for this single compression format. Given that the difference is so small, I would hesitate to create an entirely new format such as mozilla-framing, but whatever fits the project best. An alternative solution might be to compare both the correct checksum and it's complement. It's technically inaccurate, but might save some headache and should never cause any real issues.

monoxgas avatar Oct 15 '19 21:10 monoxgas

I made a small program to compare the results of the implementation with CRC examples.

$ git clone https://github.com/kubo/snzip.git
$ cd snzip
$ ./autogen.sh
$ ./configure
$ wget https://gist.githubusercontent.com/kubo/c6bc9f038e3dd66bfb37da4f8596d119/raw/b97e2a7d325fc15edbc149a036f0605f79a2e22d/crc32_test.c
  ...omitted...
$ gcc -o crc32_test crc32_test.c crc32.c crc32_sse4_2.c -msse4.2
$ ./crc32_test 
32 bytes of zeroes: OK
32 bytes of ones: OK
32 bytes of incrementing 00..1f: OK
32 bytes of decrementing 1f..00: OK
An iSCSI - SCSI Read (10) Command PDU: OK

The results are same with the CRC examples.

As for your two ideas, I prefer the former. However I postpone my decision for a while.

kubo avatar Oct 16 '19 12:10 kubo