squid icon indicating copy to clipboard operation
squid copied to clipboard

Activate OpenSSL KTLS for SSL Bump

Open ss3git opened this issue 2 years ago • 13 comments

KTLS activation is enabled by TLS config "options=ENABLE_KTLS" on supported platforms.

This code is tested to work using squid v6.6 on FreeBSD 13.2 with OpenSSL 1.1.1t-freebsd or OpenSSL 3.1.4.

ss3git avatar Jan 05 '24 14:01 ss3git

Can one of the admins verify this patch?

squid-prbot avatar Jan 05 '24 14:01 squid-prbot

Hi all,

Just letting you know that I don't have any planned commit now.

Unfortunately, my holiday will be over soon, so I may not be able to make a quick response hereafter. If you have questions asking my original intention for example, I will try to answer to them as possible, but for other things like refactoring requests, I cannot say I will do soon. (Please go ahead and change the code for that if anyone can do.)

ss3git avatar Jan 07 '24 15:01 ss3git

Updated commit description to clarify latest change and pass Anubis requirements.

Testing details dropped:

Requirements (test environment):
 FreeBSD kernel with KERN_TLS enabled
 OpenSSL built with KTLS enabled

% kldload ktls_ocf.ko
% sysctl kern.ipc.tls.enable=1

Results after some ssl bump traffic:

% sysctl -a | grep ipc.tls.stats.ocf
kern.ipc.tls.stats.ocf.retries: 0
kern.ipc.tls.stats.ocf.separate_output: 0
kern.ipc.tls.stats.ocf.inplace: 238607
kern.ipc.tls.stats.ocf.tls13_chacha20_encrypts: 32
kern.ipc.tls.stats.ocf.tls13_chacha20_decrypts: 162
kern.ipc.tls.stats.ocf.tls13_gcm_encrypts: 222917
kern.ipc.tls.stats.ocf.tls13_gcm_decrypts: 284010
kern.ipc.tls.stats.ocf.tls12_chacha20_encrypts: 265
kern.ipc.tls.stats.ocf.tls12_chacha20_decrypts: 1021
kern.ipc.tls.stats.ocf.tls12_gcm_encrypts: 15064
kern.ipc.tls.stats.ocf.tls12_gcm_decrypts: 14736
kern.ipc.tls.stats.ocf.tls11_cbc_encrypts: 329
kern.ipc.tls.stats.ocf.tls10_cbc_encrypts: 0

yadij avatar Jan 08 '24 13:01 yadij

@rousskov, please re-review latest code after my changes.

yadij avatar Jan 08 '24 13:01 yadij

Looks like it is working on Linux too.

$ uname -a
Linux ubuntu2310 6.5.0-9-generic #9-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct  7 01:35:40 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ lsmod | grep tls
tls                   151552  0

$ cat /proc/net/tls_stat
TlsCurrTxSw                             3
TlsCurrRxSw                             3
TlsCurrTxDevice                         0
TlsCurrRxDevice                         0
TlsTxSw                                 2406
TlsRxSw                                 1496
TlsTxDevice                             0
TlsRxDevice                             0
TlsDecryptError                         0
TlsRxDeviceResync                       0
TlsDecryptRetry                         0
TlsRxNoPadViolation                     0

$ /usr/local/squid/sbin/squid -v
Squid Cache: Version 6.6
Service Name: squid

This binary uses OpenSSL 3.0.10 1 Aug 2023. configure options:  '--with-large-files' '--enable-ssl-crtd' '--with-openssl=/usr/local/ssl' '--enable-security-cert-generators' '--enable-security-cert-validators--enable-linux-netfilter' '--with-default-user=daemon' --enable-ltdl-convenience

ss3git avatar Jan 17 '24 09:01 ss3git

Thank you for making excellent progress with this PR! If you have time, please fix current GitHub CI test failures, but that can wait until PR code settles. On my side, I need to review the new/current code and (hopefully) clear the old red flags and red flags raised by new comments, but that will take time; I do not expect much progress until February 12, 2024.

rousskov avatar Jan 17 '24 15:01 rousskov

FYI, I have run a very rough benchmark of this PR.

<environment>

   ----------------------     10GbEther    ------------------------- 
  | squid @ freebsd 13.2 | -------------- | apache @ windows        |
   ----------------------                 |                         |
    Intel Atom C3708                      | wget @ freebsd @ vmware |
     equipped with                         ------------------------- 
    Intel QuickAssist Tech.
    (QAT)
  
  
<benchmark tool>
  
    https_proxy = squid:bump_port
    wget -O /dev/null https://apache/some_large_file.bin
    (only 1 session)
  
  
<squid builds>
  
  (a)4k
    equivalent to the plain squid (v6.6) or v6.6 with this PR.
    HTTP_REQBUF_SZ (defined in src/http/forward.h) = 4096
  
  (b)16k
    buffer expanded.
    HTTP_REQBUF_SZ = 16384
  
  (c)64k_patched
    buffer expanded, with an extra patch for src/security/Session.cc below.
    HTTP_REQBUF_SZ = 65536
  
      @@ tls_read_method(int fd, char *buf, int len)
               }
           }
           int i = SSL_read(session, buf, len);
      +    while( i > 0 && len - i > 0 ){
      +        int i_t = SSL_read(session, buf+i, len-i);
      +
      +        if ( i_t <= 0 ){
      +            break;
      +        }
      +
      +        i += i_t;
      +    };
       #elif USE_GNUTLS
           int i = gnutls_record_recv(session, buf, len);
       #endif
  
  
<benchmark results>

 TLS_AES_256_GCM_SHA384
                    throuhput (squid cpu usage)
  1. no KTLS
   (a)4k          :  450 Mbps (100%)
   (b)16k         :  800 Mbps (100%)
   (c)64k_patched : 1100 Mbps (100%)
  
  2. KTLS without QAT
   (a)4k          :  700 Mbps (100%)
   (b)16k         : 1500 Mbps (100%)
   (c)64k_patched : 2200 Mbps ( 90%)
  
  3. KTLS with QAT
   (a)4k          :  350 Mbps ( 50%)
   (b)16k         : 1100 Mbps ( 75%)
   (c)64k_patched : 1100 Mbps ( 45%)
  
 TLS_AES_128_GCM_SHA256
                    throuhput (squid cpu usage)
  1. no KTLS
   (a)4k          :  450 Mbps (100%)
   (b)16k         :  850 Mbps (100%)
   (c)64k_patched : 1100 Mbps (100%)
                                    
  2. KTLS without QAT               
   (a)4k          :  730 Mbps (100%)
   (b)16k         : 1500 Mbps (100%)
   (c)64k_patched : 2300 Mbps ( 85%)
                                    
  3. KTLS with QAT                  
   (a)4k          :  360 Mbps ( 50%)
   (b)16k         : 1200 Mbps ( 80%)
   (c)64k_patched : 1150 Mbps ( 45%)

 Note:
  - Squid is the bottleneck if the cpu usage is 100%.
  - For others, the bottleneck may be in the kernel, QAT, or benchmark environment.
  - 4k buffer seems to be too small for QAT in this benchmark. (overhead is larger.)

From this limited test, the hard-coded HTTP_REQBUF_SZ might be insufficient to maximize the benefits of KTLS. (I am not going to touch it in this PR of course.)

ss3git avatar Jan 18 '24 13:01 ss3git