Listener tcp keepalive on socket_options not sending keepalive messages
Title: Listener tcp keepalive on socket_options not sending keepalive messages
Description: With the configuration on a listener below I do not see tcp keeplive messages from envoy to the downstream during a long lived connection.
Repro steps: Deploy a listener with tcp keepalive socket_options and send a request to that listener and a route with a long fault delay. Gather network info (e.g. from ksniff) and note there are no downstream tcp keepalive messages. Note, configuring tcp keepalive on the cluster works as designed (i.e. the messages are clearly seen in Wireshark).
Config:
"socket_options": [
{
"description": "enable keep-alive",
"level": "1",
"name": "9",
"int_value": "1"
},
{
"description": "idle time before first keep-alive probe is sent",
"level": "6",
"name": "4",
"int_value": "1"
},
{
"description": "keep-alive interval",
"level": "6",
"name": "5",
"int_value": "1"
},
{
"description": "keep-alive probes count",
"level": "6",
"name": "6",
"int_value": "1"
}
{
"match": {
"prefix": "/faults/"
},
"route": {
"cluster": "default-echo-service-8080_gloo-system",
"prefix_rewrite": "/"
},
"typed_per_filter_config": {
"envoy.filters.http.ext_authz": {
"@type": "type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthzPerRoute",
"disabled": true
},
"envoy.filters.http.fault": {
"@type": "type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault",
"delay": {
"fixed_delay": "60s",
"percentage": {
"numerator": 1000000,
"denominator": "MILLION"
}
}
},
@bdecoste I did a simple test on main branch, it works as expected
static_resources:
listeners:
- name: my_listener
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 13333
socket_options:
- level: 1
name: 9
int_value: 1
- level: 6
name: 4
int_value: 1
- level: 6
name: 5
int_value: 1
- level: 6
name: 6
int_value: 1
filter_chains:
- filters:
- name: envoy.tcp_proxy
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
stat_prefix: ingress_tcp
cluster: my_cluster
clusters:
- name: my_cluster
connect_timeout: 0.25s
type: static
lb_policy: round_robin
#protocol_selection: USE_DOWNSTREAM_PROTOCOL
load_assignment:
cluster_name: my_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 0.0.0.0
port_value: 33333
I can got output from the tcpdump output is
03:04:25.998124 IP (tos 0x0, ttl 64, id 55450, offset 0, flags [DF], proto TCP (6), length 52)
127.0.0.1.13333 > 127.0.0.1.33120: Flags [.], cksum 0xfe28 (incorrect -> 0x4c2e), seq 0, ack 1, win 512, options [nop,nop,TS val 53934642 ecr 53933618], length 0
03:04:25.998172 IP (tos 0x0, ttl 64, id 32938, offset 0, flags [DF], proto TCP (6), length 52)
127.0.0.1.33120 > 127.0.0.1.13333: Flags [.], cksum 0xfe28 (incorrect -> 0x1023), seq 1, ack 1, win 512, options [nop,nop,TS val 53934642 ecr 53883453], length 0
Which version are you using? Would you able to try my config see if that works for you?
$ envoy --version
envoy version: 74e717456bb4f43a4c9ad5ab8274451f9c3f6ad6/1.17.0-dev/Distribution/RELEASE/BoringSSL
I am also testing with type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager vs type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
@bdecoste can you test the version you are using with the config that @soulxu provided in this comment?
That config works with my envoy version.
As does the config to an HttpConnectionManager. Closing while I figure out what's wrong with my other config. Thanks!