R-Array-Hash Consider different hashing algorithms?

While mumurhash3 is a very fast hashing algorithm with good collision characteristics it is vulnerable to hash collision attacks (1). Because of this many languages with built-in hashing are using SipHash, including Ruby, Perl, Python and Rust.

If the security issue is not a major concern as R is not often exposed to untrusted input xxHash is a more recent hashing algorithm than murmurhash3 with better performance characteristics while still passing the SMHasher Test suite.

I believe all three have a similar API, so the change to either of the above should not be terribly time consuming to implement.

Mar 09 '15 18:03 jimhester

Yes, that is a concern and there may be some speed gains in switching. But the bigger bottlenecks are the garbage collector and the string table.

Mar 09 '15 21:03 jeffreyhorner

I don't know how safe it is to assume that R users don't work with untrusted inputs. Lots of data sets are downloaded from the web and/or include user-submitted contents from web forms, for instance.

May 19 '17 17:05 davharris