Vsevolod Stakhov
Vsevolod Stakhov
Well, after that experiment, I've tried to copy all my data to an intermediate aligned buffer which length is multiple of 8 (filling the rest with zeroes). In that case,...
I used `g++ -DSPEED2 -O3 -w -fpermissive -DMUM -I../ bench.c` for mum-hash and `g++ -DSPEED2 -O3 -w -fpermissive -DxxHash -g -I../ bench.c xxhash.c` for xxhash BTW.
Well, I completely agree that inlining is important since function call has its own overhead even when omitting frame pointer. However, it is not always possible, especially when you use...
By the way, after this change I have many warnings like this one: ``` /Users/vstakhov/rspamd/src/libcryptobox/../../contrib/mumhash/mum.h:223:29: warning: shift count >= width of type [-Wshift-count-overflow] u64 = *(uint32_t *) str = width...
Hum, and hashing results are now different as well.
> It does not matter for hash tables but as I am providing MUM_TARGET_INDEPENDENT_HASH I should process the tail in a consistent way. I define this macro before including `mum.h`,...
By the way, haswell target is only defined for gcc >= 4.9 as far as I see. Hence, I've modified the `avx2` guard to: ``` #if defined(__x86_64__) && defined(__GNUC__) &&...
JFYI, @bapt has tested mumhash in https://github.com/vstakhov/libucl on arm v6 with FreeBSD and he found that it is still faster than xxhash32. So I can say that mumhash is now...
That helped, thank you. ``` bin/blake2b-util bench time granularity: 24 cycles, 2195297384 cycles/second 1 byte(s): avx2, 396.00 cycles per call, 396.0000 cycles/byte avx, 333.00 cycles per call, 333.0000 cycles/byte x86,...
I have also found that `chacha_final` is completely broken, since memset(state, 0, sizeof(state)) occurs _prior_ to returning of `state->leftover`. Optimizing compiler might fix that but it is totally incorrect anyway.