zlib icon indicating copy to clipboard operation
zlib copied to clipboard

Add optimization for Adler32 checksum for Power processors

Open racardoso opened this issue 6 years ago • 2 comments

Hi,

This PR introduces a optimization for Adler32 checksum for POWER8+ processors that uses VSX (vector) instructions.

If adler32 do 1 byte at time on the first iteration s1 is s1_0 (_n means iteration n) is the initial value of adler, at beginning _0 is 1 unless adler initial value is different than 1. So s1_1 = s1_0 + c[0] after the first calculation. For the next iteration s1_2 = s1_1 + c[1] and so on. Hence, for iteration N, s1_N = s1_(N-1) + c[N] is the value of s1 on after iteration N. Therefore, for s2, s2_N = s2_0 + Ns1_N + Nc[0] + N-1*c[1] + ... + c[N] In a more general way:

s1_N = s1_0 + sum(i=1 to N)*c[i]

s2_N = s2_0 + N*s1 + sum (i=1 to N)(N-i+1)*c[i]

Where s1_N, s2_N are the values for s1, s2 after N iterations. So if we can process N-byte at time we can obtain adler32 checksum for N-byte at once. Since VSX can support 16-byte vector instructions, we can process 16-byte at time using N = 16 we have:

s1 = s1_16 = s1_0 + sum(i=1 to 16)c[i]

s2 = s2_16 = s2_0 + 16*s1 + sum(i=1 to 16)(16-i+1)*c[i]

The VSX version starts to improve the performance for buffers with size >= 64. The performance is up to 10x better than Adler32 version from adler32 non-vectorized version (average cpu time in ns on 100000 iterations):

buffer size adler32 baseline adler32 power speedup
64 44.921875 41.015625 -
1024 943.359375 130.859375 7.2
10*5552 42519.531250 3974.609375 10.7

For buffer with length <= than 64 the performance is almost the same of the non-vectorized implementation (with a small performance degradation in some cases):

buffer size adler32 baseline adler32 power
NULL 5.859375 6.812500
1 3.906250 4.859375
15 11.718750 12.625000
48 35.156250 33.203125

racardoso avatar Dec 10 '19 19:12 racardoso

FYI this PR uses the same base commit as #457 to add base code for Power optimizations. When either one gets accepted, the other can be rebased to remove the first commit from the PR.

mscastanho avatar Dec 10 '19 20:12 mscastanho

A long time ago, I have done this ticket:

  • https://github.com/madler/zlib/issues/847

Neustradamus avatar May 23 '25 00:05 Neustradamus