blazingmq-sdk-java
blazingmq-sdk-java copied to clipboard
Eliminate array copies
*Issue number of the reported bug or feature request: #30
Describe your changes
- BrokerSession passes Collection<PutMessageImpl> instead of PutMessageImpl[] to avoid making copies.
- ApplicationData calculates crc32c on the fly to avoid having to make an expensive copy of the whole payload.
- ByteBufferOutputStream:
- add peek() call to make an independent view of the underlying data.
- don't copy and repack ByteBuffer's written to the ByteBufferOutputStream.
- NettyTcpConnection still makes a copy of data read, but let netty make the copy instead of us.
Testing performed Unit and integration tests have been updated to reflect the changes.
Additional context As discussed with @sgalichkin
I was able to run the JMH benchmarks with this latest commit, here are the results
Before:
Benchmark Mode Cnt Score Error Units
ApplicationDataBenchmark.testZlibStreamInOut thrpt 10 8.664 ± 0.379 ops/s
SessionBenchmark.sendReceive512B thrpt 5 1690.186 ± 76.546 ops/s
SessionBenchmark.sendReceive512B_Zlib thrpt 5 1686.011 ± 115.191 ops/s
SessionBenchmark.sendReceive512KiB thrpt 5 6.356 ± 0.110 ops/s
SessionBenchmark.sendReceive512KiB_Zlib thrpt 5 6.360 ± 0.184 ops/s
SessionBenchmark.sendReceive5MiB thrpt 5 12.180 ± 0.367 ops/s
SessionBenchmark.sendReceive5MiB_Zlib thrpt 5 15.087 ± 0.116 ops/s
SessionBenchmark.sendReceive60MiB thrpt 5 1.013 ± 0.037 ops/s
SessionBenchmark.sendReceive60MiB_Zlib thrpt 5 1.358 ± 0.021 ops/s
SessionBenchmark.sendReceiveBatch1000ZlibConfirmLater thrpt 5 0.169 ± 0.003 ops/s
SessionBenchmark.sendReceiveBatch1000ZlibConfirmNow thrpt 5 0.225 ± 0.014 ops/s
SessionBenchmark.sendReceiveBatch100ConfirmLater thrpt 5 1.539 ± 0.093 ops/s
SessionBenchmark.sendReceiveBatch100ConfirmNow thrpt 5 2.155 ± 0.056 ops/s
SessionBenchmark.sendReceiveBatch100ZlibConfirmNow thrpt 5 2.120 ± 0.088 ops/s
SessionBenchmark.sendReceiveBatch800ZlibConfirmLater thrpt 5 0.197 ± 0.003 ops/s
After:
Benchmark Mode Cnt Score Error Units
ApplicationDataBenchmark.testZlibStreamInOut thrpt 10 6.088 ± 0.569 ops/s
SessionBenchmark.sendReceive512B thrpt 5 1881.041 ± 178.831 ops/s
SessionBenchmark.sendReceive512B_Zlib thrpt 5 1868.423 ± 186.251 ops/s
SessionBenchmark.sendReceive512KiB thrpt 5 18.013 ± 0.627 ops/s
SessionBenchmark.sendReceive512KiB_Zlib thrpt 5 16.877 ± 0.652 ops/s
SessionBenchmark.sendReceive5MiB thrpt 5 34.717 ± 0.714 ops/s
SessionBenchmark.sendReceive5MiB_Zlib thrpt 5 16.922 ± 0.130 ops/s
SessionBenchmark.sendReceive60MiB thrpt 5 3.089 ± 0.121 ops/s
SessionBenchmark.sendReceive60MiB_Zlib thrpt 5 1.503 ± 0.031 ops/s
SessionBenchmark.sendReceiveBatch1000ZlibConfirmLater thrpt 5 0.181 ± 0.003 ops/s
SessionBenchmark.sendReceiveBatch1000ZlibConfirmNow thrpt 5 0.238 ± 0.009 ops/s
SessionBenchmark.sendReceiveBatch100ConfirmLater thrpt 5 8.246 ± 0.714 ops/s
SessionBenchmark.sendReceiveBatch100ConfirmNow thrpt 5 8.279 ± 0.657 ops/s
SessionBenchmark.sendReceiveBatch100ZlibConfirmNow thrpt 5 2.242 ± 0.074 ops/s
SessionBenchmark.sendReceiveBatch800ZlibConfirmLater thrpt 5 0.226 ± 0.005 ops/s
except for the individual 512B messages, looks like 2x - 4x higher throughput for uncompressed. for ZLib, I think there is still some bottleneck as the increase isn't so pronounced.