Reduce SQL sanitizer allocations
https://github.com/jackc/pgx/issues/2124
Result:
Main optimizations:
- extensive usage of
sync.Poolfor byte buffers, lexers and parsed query structs - append-style string formatters for
int64,float64andtime.Time+bytes.Buffer.AvailableBuffer - rework of
QuoteStringandQuoteBytesto append-style (with tests for backwards compatibility)
Misc changes:
- benchmarks for
Query.SanitizeandSanitizeSQLfunctions - a tiny script for generation of benchmark reports for selected commits and diff (using benchstat)
- fuzzing of
QuoteStringandQuoteBytes(I did'n find any problems for 1h of fuzzing, but you can never be sure for 100%)
Since optimization is an extremely hard problem, I think it's worth checking some more benchmarks.
I would be very grateful for your opinion on this and recommendations/advice, @jackc @vtolstov
benchmark diffs for concrete optimisations
goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/internal/sanitize
cpu: Apple M1
│ benchmarks/0_base_case.bench │ benchmarks/1_buf_pool.bench │ benchmarks/2_append_AvailableBuffer.bench │ benchmarks/3_quoteBytes.bench │ benchmarks/4_quoteString.bench │ benchmarks/5_add_lexer_and_query_pools.bench │ benchmarks/6_drop_too_large_values_from_memory_pools.bench │
│ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │
Sanitize-8 718.2n ± 1% 578.8n ± 1% -19.41% (p=0.000 n=10) 439.9n ± 0% -38.74% (p=0.000 n=10) 413.6n ± 4% -42.42% (p=0.000 n=10) 397.1n ± 1% -44.72% (p=0.000 n=10) 403.6n ± 1% -43.81% (p=0.000 n=10) 400.8n ± 2% -44.20% (p=0.000 n=10)
SanitizeSQL-8 2.089µ ± 0% 1.956µ ± 0% -6.37% (p=0.000 n=10) 1.828µ ± 0% -12.49% (p=0.000 n=10) 1.812µ ± 1% -13.28% (p=0.000 n=10) 1.789µ ± 1% -14.36% (p=0.000 n=10) 1.670µ ± 0% -20.06% (p=0.000 n=10) 1.673µ ± 0% -19.91% (p=0.000 n=10)
geomean 1.225µ 1.064µ -13.13% 896.8n -26.79% 865.5n -29.34% 842.8n -31.19% 820.9n -32.98% 818.8n -33.15%
│ benchmarks/0_base_case.bench │ benchmarks/1_buf_pool.bench │ benchmarks/2_append_AvailableBuffer.bench │ benchmarks/3_quoteBytes.bench │ benchmarks/4_quoteString.bench │ benchmarks/5_add_lexer_and_query_pools.bench │ benchmarks/6_drop_too_large_values_from_memory_pools.bench │
│ B/op │ B/op vs base │ B/op vs base │ B/op vs base │ B/op vs base │ B/op vs base │ B/op vs base │
Sanitize-8 1488.0 ± 0% 528.0 ± 0% -64.52% (p=0.000 n=10) 472.0 ± 0% -68.28% (p=0.000 n=10) 456.0 ± 0% -69.35% (p=0.000 n=10) 424.0 ± 0% -71.51% (p=0.000 n=10) 424.0 ± 0% -71.51% (p=0.000 n=10) 424.0 ± 0% -71.51% (p=0.000 n=10)
SanitizeSQL-8 2216.0 ± 0% 1256.0 ± 0% -43.32% (p=0.000 n=10) 1200.0 ± 0% -45.85% (p=0.000 n=10) 1184.0 ± 0% -46.57% (p=0.000 n=10) 1152.0 ± 0% -48.01% (p=0.000 n=10) 552.0 ± 0% -75.09% (p=0.000 n=10) 552.0 ± 0% -75.09% (p=0.000 n=10)
geomean 1.773Ki 814.4 -55.15% 752.6 -58.55% 734.8 -59.54% 698.9 -61.51% 483.8 -73.36% 483.8 -73.36%
│ benchmarks/0_base_case.bench │ benchmarks/1_buf_pool.bench │ benchmarks/2_append_AvailableBuffer.bench │ benchmarks/3_quoteBytes.bench │ benchmarks/4_quoteString.bench │ benchmarks/5_add_lexer_and_query_pools.bench │ benchmarks/6_drop_too_large_values_from_memory_pools.bench │
│ allocs/op │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │ allocs/op vs base │
Sanitize-8 11.000 ± 0% 7.000 ± 0% -36.36% (p=0.000 n=10) 4.000 ± 0% -63.64% (p=0.000 n=10) 3.000 ± 0% -72.73% (p=0.000 n=10) 2.000 ± 0% -81.82% (p=0.000 n=10) 2.000 ± 0% -81.82% (p=0.000 n=10) 2.000 ± 0% -81.82% (p=0.000 n=10)
SanitizeSQL-8 26.00 ± 0% 22.00 ± 0% -15.38% (p=0.000 n=10) 19.00 ± 0% -26.92% (p=0.000 n=10) 18.00 ± 0% -30.77% (p=0.000 n=10) 17.00 ± 0% -34.62% (p=0.000 n=10) 10.00 ± 0% -61.54% (p=0.000 n=10) 10.00 ± 0% -61.54% (p=0.000 n=10)
geomean 16.91 12.41 -26.62% 8.718 -48.45% 7.348 -56.55% 5.831 -65.52% 4.472 -73.56% 4.472 -73.56%
LGTM. But this is obviously a very security critical part of the code, so I'd like if we can get some more eyes on this before merging.
lgtm, i'm try to check on my hot path in next few days.
In my tests i don't saw any issues.
@jackc
But this is obviously a very security critical part of the code, so I'd like if we can get some more eyes on this before merging.
It would be very much appreciated if you could suggest someone I can tag on this issue. I'm also in the process of writing more tests for SQL injection + more fuzzing
It would be very much appreciated if you could suggest someone I can tag on this issue.
I wish I could. Unfortunately, I don't know of anyone.
I'm also in the process of writing more tests for SQL injection + more fuzzing
👍 It's been a couple weeks since I reviewed the code, so I can review it again with fresh eyes now. It's not quite as good as multiple reviewers, but at least it will be multiple reviews.
I'll wait until you add the additional tests.
@sean I can incorporate your suggestions into this PR if you don't mind
may be we can move forward with this pr?
LGTM. Thanks!