R-Array-Hash icon indicating copy to clipboard operation
R-Array-Hash copied to clipboard

Consider different hashing algorithms?

Open jimhester opened this issue 10 years ago • 2 comments

While mumurhash3 is a very fast hashing algorithm with good collision characteristics it is vulnerable to hash collision attacks (1). Because of this many languages with built-in hashing are using SipHash, including Ruby, Perl, Python and Rust.

If the security issue is not a major concern as R is not often exposed to untrusted input xxHash is a more recent hashing algorithm than murmurhash3 with better performance characteristics while still passing the SMHasher Test suite.

I believe all three have a similar API, so the change to either of the above should not be terribly time consuming to implement.

jimhester avatar Mar 09 '15 18:03 jimhester

Yes, that is a concern and there may be some speed gains in switching. But the bigger bottlenecks are the garbage collector and the string table.

jeffreyhorner avatar Mar 09 '15 21:03 jeffreyhorner

I don't know how safe it is to assume that R users don't work with untrusted inputs. Lots of data sets are downloaded from the web and/or include user-submitted contents from web forms, for instance.

davharris avatar May 19 '17 17:05 davharris