No TLS Session re-use when using `Netty4ClientEngineFactory` (default)
No TLS Session re-use when using Netty4ClientEngineFactory (default)
Expected behavior
When establishing a second connection between the same client and server instance the TLS sessions should be re-used.
Actual behavior
Each connection gets its own TLS session when configuring the client with SslClientConfiguration.
Using the default SslClientEngineFactory leads to using the Netty4ClientEngineFactory which has this behaviour. Configuring the sslcontext directly with .withTransport.tls(SSLContext) leads to using the SslContextClientEngineFactory which does work.
Steps to reproduce the behavior
See testcases: https://github.com/spockz/finagle-tls-session-reuse/blob/master/src/test/scala/com/github/spockz/finagle/it/tls/TlsTest.scala, output:
[info] TlsTest:
[info] Tls on server and client
[info] - should support session resumption with ssl context
[info] + ###
[info] + ### First call
[info] + Return(Response("HTTP/1.1 Status(201)"))
[info] + ###
[info] + ###Second call
[info] + Return(Response("HTTP/1.1 Status(201)"))
[info] - should support session resumption with SslClientConfiguration *** FAILED ***
[info] Set("95-106-2055-9833-123101-3168-65-93-93-24-491537-3922-7-81906-11856-24-46789086113-50", "95-106-2055-11-1116-99-80-323326-2274-11950127-12765-29289-128-28-32-4825-73-101-33-8343") had size 2 instead of expected size 1 (TlsTest.scala:95)
[info] + ###
[info] + ### First call
[info] + Return(Response("HTTP/1.1 Status(201)"))
[info] + ###
[info] + ###Second call
[info] + Return(Response("HTTP/1.1 Status(201)"))
[info] - should support session resumption with SslClientConfiguration read from PEMS *** FAILED ***
[info] Set("95-106-205563-679748-3372574941-19-1081-10398-709-417954-39-123-11647-4-10-5120", "95-106-2055-110-85446125371-8667-1041063569-5110-124119-184487-54-1118641-43-2792104") had size 2 instead of expected size 1 (TlsTest.scala:95)
[info] + ###
[info] + ### First call
[info] + Return(Response("HTTP/1.1 Status(201)"))
[info] + ###
[info] + ###Second call
[info] + Return(Response("HTTP/1.1 Status(201)"))
[info] - should support session resumption with SslClientConfiguration and a cached Netty4ClientEngineFactory *** FAILED ***
[info] Throw(Failure(renegotiation unsupported at remote address: localhost/127.0.0.1:62739. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:62739, Downstream label: somelabel, Trace Id: e188359a301a902a.6fad2b7b5ed57202<:9511f135ae05febd with Service -> somelabel) was not an instance of com.twitter.util.Return, but an instance of com.twitter.util.Throw (TlsTest.scala:91)
[info] + ###
[info] + ### First call
[info] + Return(Response("HTTP/1.1 Status(201)"))
[info] + ###
[info] + ###Second call
[info] + Throw(Failure(renegotiation unsupported at remote address: localhost/127.0.0.1:62739. Remote Info: Not Available, flags=0x08) with RemoteInfo -> Upstream Address: Not Available, Upstream id: Not Available, Downstream Address: localhost/127.0.0.1:62739, Downstream label: somelabel, Trace Id: e188359a301a902a.6fad2b7b5ed57202<:9511f135ae05febd with Service -> somelabel)
[info] - should support session resumption with SslClientConfiguration and a SslContextEngineFactory
[info] + ###
[info] + ### First call
[info] + Return(Response("HTTP/1.1 Status(201)"))
[info] + ###
[info] + ###Second call
[info] + Return(Response("HTTP/1.1 Status(201)"))
Possible fixes
- I tried to cache the created Engine in https://github.com/spockz/finagle-tls-session-reuse/blob/master/src/test/scala/com/github/spockz/finagle/it/tls/TlsTest.scala#L147 which leads to the error
Failure(renegotiation unsupported... This appears to be because the engine is already destroyed. - Using the
SslContextClientEngineFactorymitigates the session issue, but this drops back to using JSSE for TLS which drops performance on JDK8 and loses H2.
So, I don't believe you need to cache the Engine but rather the Context. My understanding is that if the Context is cached, then it will work. We provide the Netty4ClientEngineFactory and Netty4ServerEngineFactory as the defaults because they're the safest in terms of correctness, but they aren't the best in terms of performance.
As I mentioned in gitter, we mostly use ones which are tied to our internal security infrastructure. They create a single context and reuse it, because the parameters are known ahead of time. We also have one which precreates (some) contexts and then chooses the appropriate one based on the situation. I believe you would want something similar. I'm talking to the team now about seeing if we can/should open source that.
Hey @spockz, yesterday I moved our ExternalClientEngineFactory into OSS Finagle. It's here. I recommend taking a look at it, and perhaps using it in your tests to see if it exhibits the behavior that you're expecting. If so, we can talk about how to write one that completely suits your use case.
Thank you for the example @ryanoneill. Caching the netty SslContext indeed does make more sense as the context creates the engine. Based on the input I create my own ExternalContextClientEngineFactory which gets a preconstructed netty SslContext passed in. I also configured the client with the SslContextClientEngineFactoryClient that takes a javax SSLContext. The results are astonishing to me. The SslContextClientEngineFactory is ±2.5x faster than the netty based client engine factory. Moreover, when making many connections the SslContextClientEngineFactory is 10x faster per connection, indicating re-use. The ExternalContextClientEngineFactory is as slow as the standard Netty4 based one.
[info] Benchmark Mode Cnt Score Error Units
[info] SslConnectionPerformanceBench.externalSslContextClientEngineFactoryClient thrpt 10 ≈ 10⁻⁷ ops/ns
[info] SslConnectionPerformanceBench.netty4ClientEngineFactoryClient thrpt 10 ≈ 10⁻⁷ ops/ns
[info] SslConnectionPerformanceBench.sslContextClientEngineFactoryClient thrpt 10 ≈ 10⁻⁶ ops/ns
[info] SslConnectionPerformanceBench.externalSslContextClientEngineFactoryClient avgt 10 14935413.067 ± 607647.575 ns/op
[info] SslConnectionPerformanceBench.netty4ClientEngineFactoryClient avgt 10 14478829.727 ± 1718539.667 ns/op
[info] SslConnectionPerformanceBench.sslContextClientEngineFactoryClient avgt 10 1259970.414 ± 163441.278 ns/op
[info] SslConnectionPerformanceBench.externalSslContextClientEngineFactoryClient ss 10 23296255.900 ± 2266746.677 ns/op
[info] SslConnectionPerformanceBench.netty4ClientEngineFactoryClient ss 10 23388622.700 ± 1750611.302 ns/op
[info] SslConnectionPerformanceBench.sslContextClientEngineFactoryClient ss 10 8867196.300 ± 2869120.745 ns/op
Do you have any explanations for this? I had to copy some finagle private classes btw.
We provide the Netty4ClientEngineFactory and Netty4ServerEngineFactory as the defaults because they're the safest in terms of correctness, but they aren't the best in terms of performance.
Can you share a bit more on why it is more "correct" and "safe" to use a context per connection instead of re-using it? Is it because settings might have been changed on either side since the last connection was established?