safety-dance icon indicating copy to clipboard operation
safety-dance copied to clipboard

Audit seahash

Open chills42 opened this issue 5 years ago • 2 comments

Seahash is an implementation of a fast non-cryptographic hashing algorithm, comparable to xxHash or MetroHash.

https://gitlab.redox-os.org/redox-os/seahash

chills42 avatar Feb 25 '20 13:02 chills42

From a cursory look, it seems to have 5 unsafe blocks:

  1. unaligned reads to turn a &[u8] into a u64 This looks fine to me Maybe the big unsafe block could be turned into smaller blocks inside each match arm, and maybe the byte-sized reads can be turned into slice indexing operations instead of raw pointer reads (Since rustc might be able to elide the bounds checks, but I'm not sure) It correctly uses as_ptr to get a pointer to the entire slice instead of a pointer to the first element, and it returns 0 for anything that does not satisfy the buf.len() < 8 assumption without triggering UB. It also correctly uses unaligned reads
  2. Read a u64 from a *const u8 This also looks sound to me. It's an unsafe fn, although pherhaps the requirement that the pointer points to 8 bytes of data could be specified more explicitly in the doc comment for it It correctly uses unaligned reads
  3. A test case for (2) Not much to say about this, looks fine
  4. A 121 line unsafe block in State::hash This has two parts: First it iterates through the buffer using end and current raw pointers and reading 4 u64's using the function from (2). It obtains the 32-byte ending chunk pointer by doing buf.as_ptr().offset(buf.len() as isize & !0x1F) which looks fine to me but I would like someone else to double check. The u64 reads seem fine since (2) uses unaligned reads After that it handles the remainder It calculates the remaining bytes by doing let mut excessive = buf.len() as usize + buf.as_ptr() as usize - end_ptr as usize;. I'm assuming this is fine but I think it would be more clear if it has parenthesis around the addition to make it a bit more clear that this never underflows Then it just hashes the remaining bytes using the functions from (1) and (2). As far as I can tell, all the match arms are sound.
  5. A 97 line unsafe block in SeaHasher::push_bytes I did not look into this. At a glance it seems to mostly consist of a copy_to_nonoverlapping call and then something similar to what's done in (4)

It does not have any dependencies, so there is nothing to audit with regards to deps

I also ran the tests under miri and they all passed

In summary, as far as I can tell the crate is sound (Outside of (5) which I did not look into), but can probably be improved a bit with comments and smaller unsafe blocks. Another thing we might wanna do is check if we can replace unsafe code with safe alternatives without performance regressions. The crate has criterion benchmarks which should make that easier to attempt

nico-abram avatar Feb 16 '21 02:02 nico-abram

I'm pretty sure point 1 can be refactored using u32::from_ne_bytes(buf[..4].try_into().unwrap()) instead of unaligned reads. Since the length is known in advance, the optimizer should elide bounds checks and the unreachable panic in .unwrap().

Shnatsel avatar Feb 16 '21 15:02 Shnatsel