connection to "localhost" fails with IPv4-only web server if machine has IPv4+IPv6
- Start an Envoy load balancer listening on 0.0.0.0:12345.
- Try to run a load test to http://localhost:12345.
You may get connection failures on some machines.
Sending the load to http://127.0.0.1:12345 does work.
You may have to be using a machine that supports both IPv4 and IPv6
Tentative theory:
- Because IPv6 is available, Nighthawk resolves localhost to the IPv6 address ::1.
- The Envoy is explicitly listening only on IPv4 (0.0.0.0) (if I understand correctly).
- Connecting to ::1 on port 12345 does not reach the Envoy, which is only listening on all IPv4 addresses.
That is, the issue could be with how Nighthawk resolves "localhost" on a dual IPv4/IPv6 machine.
Trace level logging should dig up [1] what address get used to validate, but I think your theory will turn out to be correct. But ... extending the theory for a bit, if
- ipv6 gets prioritised over ipv4 at the dns resolution level, and
- Nighthawk is configured to use either of them .. Then one could say the dns resolution behaved as expected? If the extended theory holds, then still, I feel this isn't great UX and maybe not that uncommon to run into. If we could probe using the other address family upon failure of trying the preferred one and notice that there's a L7 service running there, maybe we could write a helpful warning message before exiting.
[1] https://github.com/envoyproxy/nighthawk/blob/main/source/common/uri_impl.cc#L81
Added some advice about localhost, ::1, 127.0.0.1, and --address-family v4 / v6 to the troubleshooting tips in the "error 13" message. This should be enough to unblock most situations.