http2: Move all socket operations to the session thread
This is a port of a change initially made by @mbgrydeland first in Varnish Enterprise for planning reasons.
It's a major change of the architecture:
- all socket operations happen in the
h2_sessthread - the session's file descriptor switches to non-blocking
- coordination with
h2_reqthreads is done with a file descriptor- based on #4348 originally submitted independently
- this enables polling together with the socket
- control frames are sent from a fixed-length buffer living in workspace
I was the main reviewer of the change, and I personally extracted several commits (@mbgrydeland did too based on my reviews) in order to reduce the size of the main commit as much as possible:
http2: Rework HTTP/2 to be using non-blocking sockets
After porting the change to this branch, I wish I had more aggressively extracted small changes into self-contained commits because the amount of conflicts to resolve made the port of this commit alone several days of work.
There were a lot of explicit (line) conflicts, and a fair amount of implicit commits because at this point trunk and Varnish Enterprise (based on 6.0) diverged in many small ways:
- Varnish Enterprise notably has
- a waiting list "walk-away" feature (see #3835)
- listen socket management (#3959)
- task tracking with a
varnishscoreboardutility
- and Varnish Cache has
- a workspace emulator and hardened workspace semantics (#3644)
- I defaulted to ws_emu+asan in my porting endeavor, it actually helped
- a generalized VDP API (#4035)
- a new asynchronous iterator for delivery (#4209)
- a workspace emulator and hardened workspace semantics (#3644)
Between these major changes (and probably others I'm forgetting) and lots of small core changes accumulated over the years, there was pretty much always a merge conflict for all the commits I cherry-picked. I snuck two original commits at the beginning, to reduce noise when conflicts happened during the port.
I caught two minor bugs in this port, and besides the workspace conflict, things went rather smoothly. It was just tedious to replay, so tedious that I frequently switched to other (important) tasks, which added delays. But I ran the test suite for each commit, stressing them with load, the workspace emulator paired with sanitizers.
The original change went to production, facing real world h2 traffic, that's why the patch series ends with a bunch of bug fixes. For similar workloads it's putting overall a lower load on the system, and it's more reactive. If you wish to use OpenSSL, you can't perform reads and writes concurrently, so moving all socket operations to the session thread also removes a good deal of contention for HTTPS workloads.
I'm requesting @mbgrydeland's review so he can make sure that the port is faithful to our original work. I'm also requesting @walid-git's review because he rebased his trailer implementation (Varnish Enterprise counterpart of #4125) on this architectural change (and eventually he should do the same for #4125). It was reportedly easier to implement with this new architecture.
I have one demand.
Leave structural changes aside if they can wait until after merging this. If they must really happen as part of this change, it will require new commits. I didn't apply the bug fixes in the commit introducing the bugs because keeping the same list of commits will help us track what has or has not been ported either way (among commits relevant in both projects).
I'm otherwise open to squashing whitespace improvements or general polish in the right commits, but I will be away for a couple weeks.
Hello from a bus in the middle of nowhere. It was brought to my attention that someone was "not terribly happy about" me doing a "and don't do anything else, while I'm away on vacation" salute.
Apologies for giving this impression, I'll try to clarify what I wrote on a late Friday night since I was not sure I would get a reliable-enough internet access today to submit it.
Leave structural changes aside if they can wait until after merging this.
What I mean here is not "don't do anything else" but rather, if you can help it, "do anything else" after merging. In other words, if you register review comments that could be applied afterwards, let's apply them afterwards.
If they must really happen as part of this change, it will require new commits.
If you must insist on making changes before merging, "do anything else" as new commits at the end of the patch series.
I didn't apply the bug fixes in the commit introducing the bugs because keeping the same list of commits will help us track what has or has not been ported either way (among commits relevant in both projects).
I could have submitted a smaller patch series, but that would make coordination harder for us, so in order to facilitate that kind of contribution (already gone to production) from us, it was easier to retain the same list of applicable patches.
Please do "anything else" while I'm on vacation, except introducing structural changes in the middle of the patch series. :saluting_face:
I tried to apply this patch and then rebase master (trunk) and this seems to solve the issue we had with H2 where we had to set h2_window_timeout to something high in order to avoid GOAWAY. Now its been running with this patch for about a week and I cannot reproduce the issue we had.
Thank you for your feedback. In order to move this forward, I plan to revert the latest VSV patches at the beginning of the patch series, and add the patches we originally wrote for this new architecture at the end.
But I'm not starting without a sign that this will receive attention.
Not sure if its related to this patch but it could be .. after running it for an extended period we got a crash
Child (23116) Panic at: Sat, 13 Sep 2025 17:08:11 GMT
Assert error in h2_del_sess(), http2/cache_http2_session.c line 185:
Condition(!WS_IsReserved(req->ws)) not true.
version = varnish-trunk revision 43fbfd6051f0ea3760d04b893ebebcb5351ad263, vrt api = 21.0
ident = Linux,6.8.0-51-generic,x86_64,-jlinux,-smalloc,-sdefault,-hcritbit,epoll
now = 11612746.444027 (mono), 1757783219.043331 (real)
Backtrace:
ip=0x5931b56e2ab5 sp=0x734bb316a300 <VBT_format+0x65>
ip=0x5931b56562cd sp=0x734bb316a420 <pan_ic+0x24d>
ip=0x5931b56e1d39 sp=0x734bb316a580 <VAS_Fail+0x19>
ip=0x5931b569e19b sp=0x734bb316a590 <h2_del_sess+0x11b>
ip=0x5931b569ed37 sp=0x734bb316a5c0 <h2_new_session+0x8a7>
ip=0x5931b5683122 sp=0x734bb316a870 <WRK_Thread+0x3c2>
ip=0x5931b568370c sp=0x734bb316b420 <pool_thread+0x3c>
ip=0x734bb329caa4 sp=0x734bb316b450 <pthread_condattr_setpshared+0x684>
ip=0x734bb3329c3c sp=0x734bb316b500 <__clone+0x24c>
errno = 110 (Connection timed out)
argv = {
[0] = \"/usr/sbin/varnishd\",
[1] = \"-a\",
[2] = \":80\",
[3] = \"-a\",
[4] = \"127.0.0.1:81,PROXY\",
[5] = \"-a\",
[6] = \"/var/run/varnish.sock,PROXY,user=vcache,group=varnish,mode=666\",
[7] = \"-f\",
[8] = \"/etc/varnish/onecom.vcl\",
[9] = \"-T\",
[10] = \":6082\",
[11] = \"-t\",
[12] = \"120\",
[13] = \"-l\",
[14] = \"512M\",
[15] = \"-s\",
[16] = \"malloc,735G\",
[17] = \"-S\",
[18] = \"/etc/varnish/secret\",
[19] = \"-p\",
[20] = \"vcc_err_unref=false\",
[21] = \"-p\",
[22] = \"thread_pool_stack=192k\",
[23] = \"-p\",
[24] = \"workspace_client=128k\",
[25] = \"-p\",
[26] = \"workspace_backend=128k\",
[27] = \"-p\",
[28] = \"send_timeout=3600\",
[29] = \"-p\",
[30] = \"transit_buffer=1M\",
[31] = \"-p\",
[32] = \"h2_window_timeout=200\",
[33] = \"-p\",
[34] = \"timeout_idle=60\",
[35] = \"-p\",
[36] = \"vsl_mask=-H2TxHdr,-H2TxBody\",
[37] = \"-p\",
[38] = \"feature=+http2\",
[39] = \"-p\",
[40] = \"thread_pool_max=8000\",
[41] = \"-p\",
[42] = \"http_max_hdr=128\",
}
pthread.self = 0x734bb316c6c0
pthread.name = (cache-worker)
pthread.attr = {
guard = 4096,
stack_bottom = 0x734bb313d000,
stack_top = 0x734bb316d000,
stack_size = 196608,
}
thr.req = 0x7341998453e0 {
vxid = 1017251858, transport = HTTP/2
step = Req Step transport
req_body = NULL,
err_code = 1, err_reason = (null),
restarts = 0, esi_level = 0,
vary_b = (nil), vary_e = (nil),
d_ttl = 0.000000, d_grace = 0.000000,
storage = (nil),
sess = 0x727f88caea20 {
fd = 216, vxid = 1017251858,
t_open = 1757783218.307428,
t_idle = 1757783219.043316,
ws = 0x727f88caea60 {
id = \"ses\",
{s, f, r, e} = {0x727f88caeab8, +168, (nil), +576},
},
transport = HTTP/2 {
h2_sess = 0x734bb316a660 {
refcnt = 0, bogosity = 0, error = H2CE_RAPID_RESET
open_streams = 0, highest_stream = 113, goaway_last_stream = 0,
local_settings = {0x1000, 0x1, 0x64, 0xffff, 0x4000, 0x8000, 0x0, 0x0, 0x0},
remote_settings = {0x1000, 0x0, 0x64, 0x200000, 0x4000, 0x7fffffff, 0x0, 0x0, 0x0},
{rxf_len, rxf_type, rxf_flags, rxf_stream} = {4, 3, 0x0, 57},
}
client = 2a02:a18:9159:9c01:2906:de9e:9c8a:519d 55668 /var/run/varnish.sock,
local.endpoint = /var/run/varnish.sock,
local.socket = a2,
local.ip = 0.0.0.0:0,
remote.ip = 0.0.0.0:0,
server.ip = 2a02:2350:6::b788:3355:443,
client.ip = 2a02:a18:9159:9c01:2906:de9e:9c8a:519d:55668,
},
ws = 0x734199845538 {
id = \"req\",
{s, f, r, e} = {0x73419984b140, +0, +107128, +107128},
},
http_conn = 0x73419984b0b8 {
fd = 216 (@0x727f88caea44),
doclose = null(Not Closing)
ws = 0x734199845538 {
[Already dumped, see above]
},
{rxbuf_b, rxbuf_e} = {0x73419984b140, 0x73419984b140},
{pipeline_b, pipeline_e} = {(nil), (nil)},
content_length = 0,
body_status = NULL,
first_byte_timeout = 0.000000,
between_bytes_timeout = 0.000000,
},
http[req] = 0x7341998455e8 {
ws = NULL
hdrs {
},
},
vdc = 0x73419984b058 {
.magic = 0x00000000 EXPECTED: VDP_CTX_MAGIC=0xee501df7
}
vcl[vcl] = NULL
flags = {
},
privs = 0x7341998455d8 {
},
top = 0x73419984b120 {
req = 0x7341998453e0 {
[Already dumped, see above]
},
privs = 0x73419984b138 {
},
vcl[vcl0] = NULL
},
},
thr.busyobj = NULL
thr.worker = 0x734bb316b320 {
ws = 0x734bb316b3a0 {
id = \"wrk\",
{s, f, r, e} = {0x734bb316a880, +0, (nil), +2040},
},
VCL::method = BACKEND_RESPONSE,
VCL::methods = {},
},
vmods = {
std = {p=0x734bb2a2c500, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
vcs=\"43fbfd6051f0ea3760d04b893ebebcb5351ad263\", version=\"Varnish trunk\"},
one = {p=0x734bb2a2c580, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
vcs=\"5a495ac72e70c4821f77fe99804209a90ea35ae6\", version=\"libvmod-one 0.3\"},
maxminddb = {p=0x734bb2a2c680, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
vcs=\"8d3077d81ff50a9686978d0d72e06654967fdee1\", version=\"libvmod-maxminddb 0.3\"},
directors = {p=0x734bb2a2c700, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
vcs=\"43fbfd6051f0ea3760d04b893ebebcb5351ad263\", version=\"Varnish trunk\"},
xkey = {p=0x734bb2a2c780, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
vcs=\"ee73bb6cb5d7419dc71daa4baaae076bf6bbfb4b\", version=\"varnish-modules 0.26.0\"},
header = {p=0x734bb2a2c800, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
vcs=\"ee73bb6cb5d7419dc71daa4baaae076bf6bbfb4b\", version=\"varnish-modules 0.26.0\"},
cookie = {p=0x734bb2a2c900, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
vcs=\"43fbfd6051f0ea3760d04b893ebebcb5351ad263\", version=\"Varnish trunk\"},
vsthrottle = {p=0x734bb2a2c980, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
vcs=\"ee73bb6cb5d7419dc71daa4baaae076bf6bbfb4b\", version=\"varnish-modules 0.26.0\"},
bodyaccess = {p=0x734bb2a2ca00, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
vcs=\"ee73bb6cb5d7419dc71daa4baaae076bf6bbfb4b\", version=\"varnish-modules 0.26.0\"},
blob = {p=0x734bb2a2ca80, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
vcs=\"43fbfd6051f0ea3760d04b893ebebcb5351ad263\", version=\"Varnish trunk\"},
digest = {p=0x734bb2a2cb80, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=0.0,
vcs=\"bf26880ece69f2cd321cbf57dd59b687c24920e3\", version=\"libvmod-digest 1.0.3\"},
xcounter = {p=0x734bb2a2cc00, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
vcs=\"6d9d8d3ac261a49b11453cdd37da5594619ca335\", version=\"libvmod-xcounter 0.1\"},
re = {p=0x734bb2a2cc80, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
vcs=\"79313a2e71b6a2f478dfa57c821697a2f3da00cc\", version=\"libvmod-re 2.6.0\"},
taskvar = {p=0x734bb2a2cd80, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
vcs=\"ce0f348f87f63aca9b2908265d4b4028830d9530\", version=\"varnish-objvar 0.1\"},
topvar = {p=0x734bb2a2ce00, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
vcs=\"ce0f348f87f63aca9b2908265d4b4028830d9530\", version=\"varnish-objvar 0.1\"},
querystring = {p=0x734bb2a2cf00, abi=\"Varnish trunk 43fbfd6051f0ea3760d04b893ebebcb5351ad263\", vrt=21.0,
vcs=\"NOGIT\", version=\"libvmod-querystring 2.0.3\"},
},
pools = {
pool = 0x734b90400140 {
nidle = 396,
nthr = 1202,
lqueue = 0
},
pool = 0x734b90400000 {
nidle = 286,
nthr = 1130,
lqueue = 0
},
},
There is something confusing here:
err_code = 1, err_reason = (null),
req->err_code is used to make the difference between an opportunistic h2 upgrade and a prior knowledge one, and the value 1 corresponds to H2_PU_MARKER (prior knowledge). However, req->err_code is cleared here, at the beginning of the h2 session.
But, the panic indicates that the h2 session has been running for a while and has seen several streams:
open_streams = 0, highest_stream = 113, goaway_last_stream = 0,
These two items seem to be contradicting each other.
@desdic I just noticed something, it seems that the varnish instance from which the panic was reported is not running this branch:
version = varnish-trunk revision 43fbfd6051f0ea3760d04b893ebebcb5351ad263, vrt api = 21.0
Could you open a separate issue ?
Could you open a separate issue ?
Nevermind: #4396
Thank you for your feedback. In order to move this forward, I plan to revert the latest VSV patches at the beginning of the patch series, and add the patches we originally wrote for this new architecture at the end.
But I'm not starting without a sign that this will receive attention.
Bugwash: @nigoroll Thinks that he will be able to look at it this month. He also requested some clean up on the PR. I have cherry picked https://github.com/varnishcache/varnish-cache/pull/4368/commits/a9e1667b2b5e554f8d8e15b05e49d630b6040066 and https://github.com/varnishcache/varnish-cache/pull/4368/commits/6e330568662af5f7fd0dbdc669be9baefdbcaf13 to master and @dridi will do the rebase.