pgx Reduce SQL sanitizer allocations

https://github.com/jackc/pgx/issues/2124

Result: Pasted Graphic 1

Main optimizations:

extensive usage of sync.Pool for byte buffers, lexers and parsed query structs
append-style string formatters for int64, float64 and time.Time + bytes.Buffer.AvailableBuffer
rework of QuoteString and QuoteBytes to append-style (with tests for backwards compatibility)

Misc changes:

benchmarks for Query.Sanitize and SanitizeSQL functions
a tiny script for generation of benchmark reports for selected commits and diff (using benchstat)
fuzzing of QuoteString and QuoteBytes (I did'n find any problems for 1h of fuzzing, but you can never be sure for 100%)

Since optimization is an extremely hard problem, I think it's worth checking some more benchmarks.

I would be very grateful for your opinion on this and recommendations/advice, @jackc @vtolstov

Oct 01 '24 14:10 ninedraft

benchmark diffs for concrete optimisations

goos: darwin
goarch: arm64
pkg: github.com/jackc/pgx/v5/internal/sanitize
cpu: Apple M1
              │ benchmarks/0_base_case.bench │     benchmarks/1_buf_pool.bench     │ benchmarks/2_append_AvailableBuffer.bench │    benchmarks/3_quoteBytes.bench    │   benchmarks/4_quoteString.bench    │ benchmarks/5_add_lexer_and_query_pools.bench │ benchmarks/6_drop_too_large_values_from_memory_pools.bench │
              │            sec/op            │   sec/op     vs base                │      sec/op        vs base                │   sec/op     vs base                │   sec/op     vs base                │        sec/op         vs base                │               sec/op                vs base                │
Sanitize-8                       718.2n ± 1%   578.8n ± 1%  -19.41% (p=0.000 n=10)         439.9n ± 0%  -38.74% (p=0.000 n=10)   413.6n ± 4%  -42.42% (p=0.000 n=10)   397.1n ± 1%  -44.72% (p=0.000 n=10)            403.6n ± 1%  -43.81% (p=0.000 n=10)                          400.8n ± 2%  -44.20% (p=0.000 n=10)
SanitizeSQL-8                    2.089µ ± 0%   1.956µ ± 0%   -6.37% (p=0.000 n=10)         1.828µ ± 0%  -12.49% (p=0.000 n=10)   1.812µ ± 1%  -13.28% (p=0.000 n=10)   1.789µ ± 1%  -14.36% (p=0.000 n=10)            1.670µ ± 0%  -20.06% (p=0.000 n=10)                          1.673µ ± 0%  -19.91% (p=0.000 n=10)
geomean                          1.225µ        1.064µ       -13.13%                        896.8n       -26.79%                  865.5n       -29.34%                  842.8n       -31.19%                           820.9n       -32.98%                                         818.8n       -33.15%

              │ benchmarks/0_base_case.bench │     benchmarks/1_buf_pool.bench     │ benchmarks/2_append_AvailableBuffer.bench │    benchmarks/3_quoteBytes.bench    │   benchmarks/4_quoteString.bench    │ benchmarks/5_add_lexer_and_query_pools.bench │ benchmarks/6_drop_too_large_values_from_memory_pools.bench │
              │             B/op             │    B/op      vs base                │       B/op         vs base                │    B/op      vs base                │    B/op      vs base                │         B/op          vs base                │                B/op                 vs base                │
Sanitize-8                       1488.0 ± 0%    528.0 ± 0%  -64.52% (p=0.000 n=10)          472.0 ± 0%  -68.28% (p=0.000 n=10)    456.0 ± 0%  -69.35% (p=0.000 n=10)    424.0 ± 0%  -71.51% (p=0.000 n=10)             424.0 ± 0%  -71.51% (p=0.000 n=10)                           424.0 ± 0%  -71.51% (p=0.000 n=10)
SanitizeSQL-8                    2216.0 ± 0%   1256.0 ± 0%  -43.32% (p=0.000 n=10)         1200.0 ± 0%  -45.85% (p=0.000 n=10)   1184.0 ± 0%  -46.57% (p=0.000 n=10)   1152.0 ± 0%  -48.01% (p=0.000 n=10)             552.0 ± 0%  -75.09% (p=0.000 n=10)                           552.0 ± 0%  -75.09% (p=0.000 n=10)
geomean                         1.773Ki         814.4       -55.15%                         752.6       -58.55%                   734.8       -59.54%                   698.9       -61.51%                            483.8       -73.36%                                          483.8       -73.36%

              │ benchmarks/0_base_case.bench │    benchmarks/1_buf_pool.bench     │ benchmarks/2_append_AvailableBuffer.bench │   benchmarks/3_quoteBytes.bench    │   benchmarks/4_quoteString.bench   │ benchmarks/5_add_lexer_and_query_pools.bench │ benchmarks/6_drop_too_large_values_from_memory_pools.bench │
              │          allocs/op           │ allocs/op   vs base                │     allocs/op      vs base                │ allocs/op   vs base                │ allocs/op   vs base                │      allocs/op        vs base                │             allocs/op               vs base                │
Sanitize-8                       11.000 ± 0%   7.000 ± 0%  -36.36% (p=0.000 n=10)          4.000 ± 0%  -63.64% (p=0.000 n=10)   3.000 ± 0%  -72.73% (p=0.000 n=10)   2.000 ± 0%  -81.82% (p=0.000 n=10)             2.000 ± 0%  -81.82% (p=0.000 n=10)                           2.000 ± 0%  -81.82% (p=0.000 n=10)
SanitizeSQL-8                     26.00 ± 0%   22.00 ± 0%  -15.38% (p=0.000 n=10)          19.00 ± 0%  -26.92% (p=0.000 n=10)   18.00 ± 0%  -30.77% (p=0.000 n=10)   17.00 ± 0%  -34.62% (p=0.000 n=10)             10.00 ± 0%  -61.54% (p=0.000 n=10)                           10.00 ± 0%  -61.54% (p=0.000 n=10)
geomean                           16.91        12.41       -26.62%                         8.718       -48.45%                  7.348       -56.55%                  5.831       -65.52%                            4.472       -73.56%                                          4.472       -73.56%

Oct 01 '24 15:10 ninedraft

LGTM. But this is obviously a very security critical part of the code, so I'd like if we can get some more eyes on this before merging.

Oct 05 '24 14:10 jackc

lgtm, i'm try to check on my hot path in next few days.

Oct 06 '24 13:10 vtolstov

In my tests i don't saw any issues.

Oct 10 '24 21:10 vtolstov

@jackc

But this is obviously a very security critical part of the code, so I'd like if we can get some more eyes on this before merging.

It would be very much appreciated if you could suggest someone I can tag on this issue. I'm also in the process of writing more tests for SQL injection + more fuzzing

Oct 15 '24 17:10 ninedraft

It would be very much appreciated if you could suggest someone I can tag on this issue.

I wish I could. Unfortunately, I don't know of anyone.

I'm also in the process of writing more tests for SQL injection + more fuzzing

👍 It's been a couple weeks since I reviewed the code, so I can review it again with fresh eyes now. It's not quite as good as multiple reviewers, but at least it will be multiple reviews.

I'll wait until you add the additional tests.

Oct 18 '24 21:10 jackc

@sean I can incorporate your suggestions into this PR if you don't mind

Dec 09 '24 14:12 ninedraft

may be we can move forward with this pr?

Dec 25 '24 23:12 vtolstov

LGTM. Thanks!

Dec 31 '24 02:12 jackc