zlib icon indicating copy to clipboard operation
zlib copied to clipboard

Inflate fast NEON optimization

Open Adenilson opened this issue 7 years ago • 7 comments

Using SIMD to perform wide loads/stores in inflate_fast, this should improve performance on ARM between 18% to 30% depending on the data.

Plus it has the fix for the InflateBack() corner case (details in: https://bugs.chromium.org/p/chromium/issues/detail?id=769880).

This optimization is shipping in Chromium since M62 (landed in the repository around September/October 2017).

Adenilson avatar Apr 05 '18 18:04 Adenilson

Ideally this should be applied first followed by updated (WIP) versions of the checksums patches (i.e. optimized crc32 and adler32).

Adenilson avatar Apr 05 '18 18:04 Adenilson

@madler any suggestions?

Adenilson avatar Apr 05 '18 18:04 Adenilson

For further details concerning the optimization, please see: https://bugs.chromium.org/p/chromium/issues/detail?id=697280

Adenilson avatar Apr 05 '18 18:04 Adenilson

Some benchmarking data running in an ARM CPU (big core A72, snappy data set), shows an average of 31% performance improvement:

a) Vanilla (xenial)adenilson@localhost:~/canonical-fork/build$ time taskset -c 3 ./zlib_bench gzip ~/corpora/snappy/testdata/* /home/adenilson/corpora/snappy/testdata/alice29.txt : GZIP: [b 1M] bytes 152089 -> 54426 35.8% comp 7.1 ( 7.2) MB/s uncomp 127.7 (127.9) MB/s /home/adenilson/corpora/snappy/testdata/asyoulik.txt : GZIP: [b 1M] bytes 125179 -> 48949 39.1% comp 6.5 ( 6.5) MB/s uncomp 120.5 (120.6) MB/s /home/adenilson/corpora/snappy/testdata/baddata1.snappy : GZIP: [b 1M] bytes 27512 -> 22920 83.3% comp 18.6 ( 18.7) MB/s uncomp 88.2 ( 88.3) MB/s /home/adenilson/corpora/snappy/testdata/baddata2.snappy : GZIP: [b 1M] bytes 27483 -> 23000 83.7% comp 18.6 ( 18.6) MB/s uncomp 88.4 ( 88.4) MB/s /home/adenilson/corpora/snappy/testdata/baddata3.snappy : GZIP: [b 1M] bytes 28384 -> 23705 83.5% comp 18.5 ( 18.5) MB/s uncomp 87.9 ( 87.9) MB/s /home/adenilson/corpora/snappy/testdata/fireworks.jpeg : GZIP: [b 1M] bytes 123093 -> 122927 99.9% comp 21.8 ( 21.8) MB/s uncomp 314.5 (314.8) MB/s /home/adenilson/corpora/snappy/testdata/geo.protodata : GZIP: [b 1M] bytes 118588 -> 15143 12.8% comp 34.4 ( 34.7) MB/s uncomp 237.2 (237.3) MB/s /home/adenilson/corpora/snappy/testdata/html : GZIP: [b 1M] bytes 102400 -> 13711 13.4% comp 27.3 ( 27.5) MB/s uncomp 220.2 (220.4) MB/s /home/adenilson/corpora/snappy/testdata/html_x_4 : GZIP: [b 1M] bytes 409600 -> 53299 13.0% comp 24.3 ( 24.5) MB/s uncomp 220.7 (221.1) MB/s /home/adenilson/corpora/snappy/testdata/kppkn.gtb : GZIP: [b 1M] bytes 184320 -> 38789 21.0% comp 5.2 ( 5.3) MB/s uncomp 162.3 (162.5) MB/s /home/adenilson/corpora/snappy/testdata/lcet10.txt : GZIP: [b 1M] bytes 426754 -> 144904 34.0% comp 7.2 ( 7.2) MB/s uncomp 129.4 (129.6) MB/s /home/adenilson/corpora/snappy/testdata/paper-100k.pdf : GZIP: [b 1M] bytes 102400 -> 81276 79.4% comp 22.1 ( 22.1) MB/s uncomp 146.2 (146.4) MB/s /home/adenilson/corpora/snappy/testdata/plrabn12.txt : GZIP: [b 1M] bytes 481861 -> 195220 40.5% comp 5.3 ( 5.3) MB/s uncomp 117.1 (117.4) MB/s /home/adenilson/corpora/snappy/testdata/urls.10K : GZIP: [b 1M] bytes 702087 -> 222381 31.7% comp 14.0 ( 14.0) MB/s uncomp 141.4 (141.5) MB/s

b) inflate_fast (xenial)adenilson@localhost:~/canonical-fork/build$ time taskset -c 3 ./zlib_bench gzip ~/corpora/snappy/testdata/* /home/adenilson/corpora/snappy/testdata/alice29.txt : GZIP: [b 1M] bytes 152089 -> 54426 35.8% comp 7.2 ( 7.2) MB/s uncomp 177.1 (177.2) MB/s /home/adenilson/corpora/snappy/testdata/asyoulik.txt : GZIP: [b 1M] bytes 125179 -> 48949 39.1% comp 6.5 ( 6.5) MB/s uncomp 164.5 (164.6) MB/s /home/adenilson/corpora/snappy/testdata/baddata1.snappy : GZIP: [b 1M] bytes 27512 -> 22920 83.3% comp 18.8 ( 18.8) MB/s uncomp 90.8 ( 91.0) MB/s /home/adenilson/corpora/snappy/testdata/baddata2.snappy : GZIP: [b 1M] bytes 27483 -> 23000 83.7% comp 18.8 ( 18.8) MB/s uncomp 90.7 ( 90.7) MB/s /home/adenilson/corpora/snappy/testdata/baddata3.snappy : GZIP: [b 1M] bytes 28384 -> 23705 83.5% comp 18.7 ( 18.7) MB/s uncomp 90.4 ( 90.5) MB/s /home/adenilson/corpora/snappy/testdata/fireworks.jpeg : GZIP: [b 1M] bytes 123093 -> 122927 99.9% comp 21.8 ( 21.9) MB/s uncomp 311.1 (311.3) MB/s /home/adenilson/corpora/snappy/testdata/geo.protodata : GZIP: [b 1M] bytes 118588 -> 15143 12.8% comp 34.9 ( 35.1) MB/s uncomp 299.1 (299.1) MB/s /home/adenilson/corpora/snappy/testdata/html : GZIP: [b 1M] bytes 102400 -> 13711 13.4% comp 27.7 ( 27.7) MB/s uncomp 284.6 (284.9) MB/s /home/adenilson/corpora/snappy/testdata/html_x_4 : GZIP: [b 1M] bytes 409600 -> 53299 13.0% comp 24.7 ( 24.8) MB/s uncomp 284.9 (285.5) MB/s /home/adenilson/corpora/snappy/testdata/kppkn.gtb : GZIP: [b 1M] bytes 184320 -> 38789 21.0% comp 5.3 ( 5.3) MB/s uncomp 222.0 (222.1) MB/s /home/adenilson/corpora/snappy/testdata/lcet10.txt : GZIP: [b 1M] bytes 426754 -> 144904 34.0% comp 7.2 ( 7.3) MB/s uncomp 180.0 (180.1) MB/s /home/adenilson/corpora/snappy/testdata/paper-100k.pdf : GZIP: [b 1M] bytes 102400 -> 81276 79.4% comp 20.2 ( 21.8) MB/s uncomp 147.9 (149.5) MB/s /home/adenilson/corpora/snappy/testdata/plrabn12.txt : GZIP: [b 1M] bytes 481861 -> 195220 40.5% comp 5.3 ( 5.3) MB/s uncomp 163.4 (163.7) MB/s /home/adenilson/corpora/snappy/testdata/urls.10K : GZIP: [b 1M] bytes 702087 -> 222381 31.7% comp 14.0 ( 14.0) MB/s uncomp 175.1 (175.2) MB/s

Adenilson avatar Apr 25 '18 17:04 Adenilson

@madler any comment?

Adenilson avatar Jul 10 '18 10:07 Adenilson

@madler ping?

Adenilson avatar Aug 15 '18 07:08 Adenilson

Can you rebase on the latest master? :)

PolynomialDivision avatar Oct 29 '22 15:10 PolynomialDivision