clickhouse-java
clickhouse-java copied to clipboard
Performance [WIP]
This is an umbrella issue to track performance issues, collect ideas, and document improvements as well as latest update.
Where are we now
Below is elapsed time comparison among ClickHouse native command-line client, curl, scp, Java Client, and JDBC driver. Lower is better.
READ
| Client | Int8 | Int8(lz4) | UInt64 | UInt64(lz4) | String | String(lz4) | Mixed | Mixed(lz4) |
|---|---|---|---|---|---|---|---|---|
| CLI(Native) | 2.21 | 8.89 | 17.37 | 17.02 | ||||
| CLI(RowBinary) | 4.88 | 11.39 | 14.41 | 15.83 | ||||
| curl(Native) [^nodeser] | 2.67 | 11.39 | 21.00 | 18.18 | ||||
| curl(RowBinary) [^nodeser] | 6.13 | 9.45 | 14.79 | 14.73 | ||||
| SCP(Native) [^nodeser] | 4.96 | - | 36.79 | - | 44.28 | - | 52.15 | - |
| SCP(RowBinary) [^nodeser] | 5.19 | - | 37.22 | - | 46.10 | - | 51.18 | - |
| Java(http) [^single] | 6.99 | 13.66 | 31.06 | 25.86 | - | |||
| Java(http) | 14.15 | 25.46 | 36.89 | 51.18 | ||||
| JDBC(Native) | 18.52 | 38.22 | 76.06 | 79.10 | ||||
| JDBC(RowBinary) | 22.59 | 40.11 | 67.61 | 53.01 |
WRITE
[^nodeser]: No compression/decompression and serialization/deserialization at client side. [^single]: Custom serialization/deserialization, one row at a time. [^batch]: Custom serialization/deserialization, multiple rows at a time.
Queries used for testing:
| Case | Query | Native | RowBinary |
|---|---|---|---|
| Int8 | select (number % 255)::Int8 v from numbers(500000000) | 477MB | 477MB |
| UInt64 | select * from numbers(500000000) | 3.8GB | 3.8GB |
| String | select toString(number) from numbers(500000000) | 4.6GB | 7.8GB |
| Mixed | select (number % 255)::Int8 a, number b, toString(number)c from numbers(300000000) | 5.3GB | 7.9GB |
Issues
Pitfalls
- Protocol matters the most
- Large SQL statement
- Batch insert
Improvements
- 0.3.2-patch9
- 0.3.2
TODOs
- [ ] 1
- [ ] 2