dnsproxy icon indicating copy to clipboard operation
dnsproxy copied to clipboard

Enable DoT connection reuse while requests are in flight

Open kosekmi opened this issue 3 years ago • 3 comments

Currently, a DoT connection is put back into the pool when the DNS response is received (or an error occurs). Hence, a new DoT connection is established if a second request should be served while a first request is in flight. While this behaviour effectively counters TCP HoLB, it is more beneficial to reuse the already established connection for the second request, since the overhead of the DoT connection establishment is pretty high (1 RTT for TCP, 0-2 RTTs for TLS (depending on TLS version, session resumption, and 0-RTT)).

This PR puts the connection right back into the pool to allow it to be reused while requests are in flight. The behaviour is now identical to DNS over HTTPS/2 (DoH2), where a single DoH2 connection is also used when multiple requests are in flight.

Attached you will find 2 pcaps showcasing the behaviour before and after the change: dot.inflight.reuse.pcaps.zip

kosekmi avatar Sep 17 '22 10:09 kosekmi

This change won't work properly when you process multiple queries through a single upstream. The responses will be received out of order so the risk of having a DNS ID mismatch is very high.

The idea is correct, though. We should allow processing multiple queries in parallel through a single TCP/TLS connection. But the implementation should be much more complicated and I doubt we can/should simply reuse the current TLSPool.

ameshkov avatar Sep 19 '22 10:09 ameshkov

This change won't work properly when you process multiple queries through a single upstream.

Good point. We thought that this was already handled since it's required by the DoT RFC (and the implementation requirements RFC):

Since pipelined responses can arrive out of order, clients MUST match responses to outstanding queries on the same TLS connection using the Message ID. If the response contains a Question Section, the client MUST match the QNAME, QCLASS, and QTYPE fields. Failure by clients to properly match responses to outstanding queries can have serious consequences for interoperability

We did some additional testing, and can confirm that out of order responses result in an id mismatch, hence closing the connection. We will develop a proper fix and update this PR.

kosekmi avatar Sep 20 '22 08:09 kosekmi

We just updated this PR with a fix for out-of-order responses, following the requirements the DoT RFC (and the implementation requirements RFC):

Since pipelined responses can arrive out of order, clients MUST match responses to outstanding queries on the same TLS connection using the Message ID. If the response contains a Question Section, the client MUST match the QNAME, QCLASS, and QTYPE fields. Failure by clients to properly match responses to outstanding queries can have serious consequences for interoperability

We now store all responses with unknown IDs in a map from which they can be retrieved later on, and we explicitly check the QNAME, QCLASS, and QTYPE fields in the responses.

kosekmi avatar Sep 30 '22 14:09 kosekmi