Http/1.1 server suspected flow control bug
Describe the bug In case zilla is set up as an http proxy between an nginx h2 server and a curl http/1.1 client, if response validation is enabled, and the response is invalid, zilla displays an erratic behavior as described below.
To Reproduce Steps to reproduce the behavior:
Prerequisites:
- we need to enable http response validation (as of now this is a PR on review that's not merged to
developyet) - we need to fix a minor bug in
IntegerValidator(this is being fixed on a feature branch, as of now that's not merged todevelopyet) https://github.com/aklivity/zilla/commit/30a7de0b45f841020d81489603b8c0408b89816d
- return length == 4;
+ return length - index == 4;
- we need to build a zilla instance with the above two changes
Setup:
- create a file named
42.integerwith a content length != 4 characters in<some-directory>(to simulate the integer validation invalid case) - create an
nginx.confin<some-directory>with h2 enabled (this a defaultnginx.conf, only with theserver { listen 80 http2; }block inserted):
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
server
{
listen 80 http2;
}
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
#tcp_nopush on;
#proxy_read_timeout 3600;
#proxy_connect_timeout 3600;
#proxy_send_timeout 3600;
client_body_timeout 3600;
client_header_timeout 3600;
keepalive_timeout 65;
#gzip on;
include /etc/nginx/conf.d/*.conf;
}
- start up nginx
docker run -p 8080:80 -v <some-directory>:/usr/share/nginx/html:ro -v <some-directory>/nginx.conf:/etc/nginx/nginx.conf --rm --name nginx nginx
- create this zilla configuration file for a http proxy mode with validation enabled:
---
name: example
bindings:
north_tcp_server:
type: tcp
kind: server
options:
host: 0.0.0.0
port: 7114
exit: north_http_server
north_http_server:
type: http
kind: server
exit: south_http_client
south_http_client:
type: http
kind: client
options:
versions:
- h2
requests:
- path: /42.integer
method: GET
responses:
- status: 200
content: integer
exit: south_tcp_client
south_tcp_client:
type: tcp
kind: client
options:
host: localhost
port: 8080
- start a zilla instance with assertions enabled
JAVA_OPTIONS="-ea -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005" ./zilla start -v -e -w 1
- execute this curl command with http/1.1 (multiple times):
curl -v http://localhost:7114/42.integer
One of these cases are happening, ususally in this order:
- expected case, curl prints:
curl: (18) transfer closed with 5 bytes remaining to read
- curl hangs, must be stopped with ^C
- curl displays an error message:
* Excess found in a read ...
* Closing connection
- zilla crashes, with an
AssertionErrorwith a stack trace like this:
Caused by: java.lang.AssertionError
at io.aklivity.zilla.runtime.binding.http/io.aklivity.zilla.runtime.binding.http.internal.stream.HttpClientFactory$HttpExchange.onResponseWindow(HttpClientFactory.java:4941)
at io.aklivity.zilla.runtime.binding.http/io.aklivity.zilla.runtime.binding.http.internal.stream.HttpClientFactory$HttpExchange.onApplication(HttpClientFactory.java:4569)
at io.aklivity.zilla.runtime.binding.http/io.aklivity.zilla.runtime.binding.http.internal.stream.HttpClientFactory$HttpClientPool.lambda$newStream$2(HttpClientFactory.java:2086)
at io.aklivity.zilla.runtime.engine/io.aklivity.zilla.runtime.engine.internal.registry.EngineWorker.handleReadReply(EngineWorker.java:1276)
Expected behavior
The curl command pointed out above should consistently result in the expected case (1) as described above.
Additional context We have investigated this issue with @akrambek and concluded that this is most likely a flow control bug in the http/1.1 server.
The issue only appears in the case zilla client -> nginx use h2 and curl -> zilla server use http/1.1.
May or may not be related... I bumped into this other issue where the onReponseBegin in the exchange crashed on the highlighted assert where acknowledge was 702 and responseAck was 0.