Activate OpenSSL KTLS for SSL Bump
KTLS activation is enabled by TLS config "options=ENABLE_KTLS" on supported platforms.
This code is tested to work using squid v6.6 on FreeBSD 13.2 with OpenSSL 1.1.1t-freebsd or OpenSSL 3.1.4.
Can one of the admins verify this patch?
Hi all,
Just letting you know that I don't have any planned commit now.
Unfortunately, my holiday will be over soon, so I may not be able to make a quick response hereafter. If you have questions asking my original intention for example, I will try to answer to them as possible, but for other things like refactoring requests, I cannot say I will do soon. (Please go ahead and change the code for that if anyone can do.)
Updated commit description to clarify latest change and pass Anubis requirements.
Testing details dropped:
Requirements (test environment):
FreeBSD kernel with KERN_TLS enabled
OpenSSL built with KTLS enabled
% kldload ktls_ocf.ko
% sysctl kern.ipc.tls.enable=1
Results after some ssl bump traffic:
% sysctl -a | grep ipc.tls.stats.ocf
kern.ipc.tls.stats.ocf.retries: 0
kern.ipc.tls.stats.ocf.separate_output: 0
kern.ipc.tls.stats.ocf.inplace: 238607
kern.ipc.tls.stats.ocf.tls13_chacha20_encrypts: 32
kern.ipc.tls.stats.ocf.tls13_chacha20_decrypts: 162
kern.ipc.tls.stats.ocf.tls13_gcm_encrypts: 222917
kern.ipc.tls.stats.ocf.tls13_gcm_decrypts: 284010
kern.ipc.tls.stats.ocf.tls12_chacha20_encrypts: 265
kern.ipc.tls.stats.ocf.tls12_chacha20_decrypts: 1021
kern.ipc.tls.stats.ocf.tls12_gcm_encrypts: 15064
kern.ipc.tls.stats.ocf.tls12_gcm_decrypts: 14736
kern.ipc.tls.stats.ocf.tls11_cbc_encrypts: 329
kern.ipc.tls.stats.ocf.tls10_cbc_encrypts: 0
@rousskov, please re-review latest code after my changes.
Looks like it is working on Linux too.
$ uname -a
Linux ubuntu2310 6.5.0-9-generic #9-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 7 01:35:40 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
$ lsmod | grep tls
tls 151552 0
$ cat /proc/net/tls_stat
TlsCurrTxSw 3
TlsCurrRxSw 3
TlsCurrTxDevice 0
TlsCurrRxDevice 0
TlsTxSw 2406
TlsRxSw 1496
TlsTxDevice 0
TlsRxDevice 0
TlsDecryptError 0
TlsRxDeviceResync 0
TlsDecryptRetry 0
TlsRxNoPadViolation 0
$ /usr/local/squid/sbin/squid -v
Squid Cache: Version 6.6
Service Name: squid
This binary uses OpenSSL 3.0.10 1 Aug 2023. configure options: '--with-large-files' '--enable-ssl-crtd' '--with-openssl=/usr/local/ssl' '--enable-security-cert-generators' '--enable-security-cert-validators--enable-linux-netfilter' '--with-default-user=daemon' --enable-ltdl-convenience
Thank you for making excellent progress with this PR! If you have time, please fix current GitHub CI test failures, but that can wait until PR code settles. On my side, I need to review the new/current code and (hopefully) clear the old red flags and red flags raised by new comments, but that will take time; I do not expect much progress until February 12, 2024.
FYI, I have run a very rough benchmark of this PR.
<environment>
---------------------- 10GbEther -------------------------
| squid @ freebsd 13.2 | -------------- | apache @ windows |
---------------------- | |
Intel Atom C3708 | wget @ freebsd @ vmware |
equipped with -------------------------
Intel QuickAssist Tech.
(QAT)
<benchmark tool>
https_proxy = squid:bump_port
wget -O /dev/null https://apache/some_large_file.bin
(only 1 session)
<squid builds>
(a)4k
equivalent to the plain squid (v6.6) or v6.6 with this PR.
HTTP_REQBUF_SZ (defined in src/http/forward.h) = 4096
(b)16k
buffer expanded.
HTTP_REQBUF_SZ = 16384
(c)64k_patched
buffer expanded, with an extra patch for src/security/Session.cc below.
HTTP_REQBUF_SZ = 65536
@@ tls_read_method(int fd, char *buf, int len)
}
}
int i = SSL_read(session, buf, len);
+ while( i > 0 && len - i > 0 ){
+ int i_t = SSL_read(session, buf+i, len-i);
+
+ if ( i_t <= 0 ){
+ break;
+ }
+
+ i += i_t;
+ };
#elif USE_GNUTLS
int i = gnutls_record_recv(session, buf, len);
#endif
<benchmark results>
TLS_AES_256_GCM_SHA384
throuhput (squid cpu usage)
1. no KTLS
(a)4k : 450 Mbps (100%)
(b)16k : 800 Mbps (100%)
(c)64k_patched : 1100 Mbps (100%)
2. KTLS without QAT
(a)4k : 700 Mbps (100%)
(b)16k : 1500 Mbps (100%)
(c)64k_patched : 2200 Mbps ( 90%)
3. KTLS with QAT
(a)4k : 350 Mbps ( 50%)
(b)16k : 1100 Mbps ( 75%)
(c)64k_patched : 1100 Mbps ( 45%)
TLS_AES_128_GCM_SHA256
throuhput (squid cpu usage)
1. no KTLS
(a)4k : 450 Mbps (100%)
(b)16k : 850 Mbps (100%)
(c)64k_patched : 1100 Mbps (100%)
2. KTLS without QAT
(a)4k : 730 Mbps (100%)
(b)16k : 1500 Mbps (100%)
(c)64k_patched : 2300 Mbps ( 85%)
3. KTLS with QAT
(a)4k : 360 Mbps ( 50%)
(b)16k : 1200 Mbps ( 80%)
(c)64k_patched : 1150 Mbps ( 45%)
Note:
- Squid is the bottleneck if the cpu usage is 100%.
- For others, the bottleneck may be in the kernel, QAT, or benchmark environment.
- 4k buffer seems to be too small for QAT in this benchmark. (overhead is larger.)
From this limited test, the hard-coded HTTP_REQBUF_SZ might be insufficient to maximize the benefits of KTLS. (I am not going to touch it in this PR of course.)