varnish-cache trf: Add client traffic management to the CLI

The interface offered for operations is a set of traffic.accept, traffic.refuse and traffic.status commands with similar semantics as start, stop and status.

With this patch series we may start refusing traffic while the cache process is running. This can be used to implement a graceful shutdown, or dynamic listen endpoints, but the scope here is limited to traffic management as in accepting or refusing new client traffic, in a broad sense.

This has been discussed informally in the past, but for the details on what refusing traffic means, here is what the manual/help say about the new commands:

CLI_CMD(TRAFFIC_ACCEPT,
    "traffic.accept",
    "traffic.accept",
    "Accept new client connections and requests.",
    "Accepting client traffic is the normal mode of operations. Listen "
    "addresses must all be available to succeed, which may not be the "
    "case after a traffic.refuse command until all ongoing connections "
    "are closed.",
    0, 0
)

CLI_CMD(TRAFFIC_REFUSE,
    "traffic.refuse",
    "traffic.refuse",
    "Refuse new client connections and requests.",
    "When a Varnish instance is taken offline, for example to be removed "
    "from a cluster, new traffic can be refused without affecting ongoing "
    "transactions.\n\n"
    "Listen sockets are closed and it is no longer possible to establish "
    "new connections for clients. This means that traffic.accept may fail "
    "to bind listen addresses again, if meanwhile they end up already in "
    "use.\n\n"
    "Refusing new traffic also implies refusing new requests for exsiting "
    "connections, disabling HTTP/1 keep-alive. Responses initiated after "
    "client traffic started being refused will have a 'Connection: close' "
    "header. If a request is received on a keep-alive session while traffic "
    "is being refused, it results in a minimal 503 response.\n\n"
    "For h2 traffic, a GOAWAY frame is sent to clients to notify them that "
    "ongoing streams can complete, but new streams will be refused.",
    0, 0
)

CLI_CMD(TRAFFIC_STATUS,
    "traffic.status",
    "traffic.status [-j]",
    "Check the status for new client connections and requests.",
    "",
    0, 0
)

To circumvent the jail security model, @mbgrydeland implemented a facility to pass file descriptors from mgt to cache. This is partly related to the CLI and security barriers discussion started in https://github.com/varnishcache/varnish-cache/pull/3906#issuecomment-1607571113.

Jul 17 '23 13:07 dridi

It looks to me like I accidentally removed someone from the reviewer list. If it did, apologies, this was unintentional, I just wanted to add myself...

Jul 17 '23 14:07 nigoroll

It looks to me like I accidentally removed someone from the reviewer list. If it did, apologies, this was unintentional, I just wanted to add myself...

You did not, all good.

edit: you probably saw a reviewer suggestion that went away once you added yourself.

Jul 17 '23 15:07 dridi

I think this is a valuable prototype, but it does have a lot of "bolted on" aspects which I would prefer to avoid.

As mentioned during bugwash, originally the Mgt/Child CLI pipe came close to being a socketpair(2) precisely so that fd's could be passed through, but mostly as a matter of saving development time it became a pipe(2) instead.

I think the way I would like to see this progress is the following:

Turn the Mgt/Child CLI pipe into socketpair(2)
Move the socket open/close/listen stuff back to MGT where it originally lived and pass the sockets to the Child for accepting

This regains the advantage of sockets surviving restarts.

Then add CLI commands to dynamically manage sockets/traffic

The issue of naming sockets in VCL is not really different from naming stevedores, but we need to decide if a socket is "abstract" or if it is it's address, ie: if you can:

  socket.close prod_sock
  socket.open prod_sock some_new_address

Also as mentioned, "traffic." sounds to me like the place bandwidth limiting and steering lives, and if we think we might go there, then maybe sockets should live there too. Otherwise I think socket. communicates better what goes on.

Aug 14 '23 14:08 bsdphk

As mentioned during bugwash, originally the Mgt/Child CLI pipe came close to being a socketpair(2) precisely so that fd's could be passed through, but mostly as a matter of saving development time it became a pipe(2) instead.

We considered using the CLI to pass file descriptors and decided against. With a separate socket pair we can cleanly stash file descriptors in the cache process, and then engage with the actual operation via the CLI. This way the cache-main thread doesn't have to worry about both listening to the "plain" CLI stream and control messages for file descriptors.

Nothing changes in VCLI.

Move the socket open/close/listen stuff back to MGT where it originally lived and pass the sockets to the Child for accepting

Prior to this pull request, open/close happen in MGT and listen happens in cache. With this pull request that is still the case, except that:

MGT closes its sockets once cache has them
cache closes its sockets when instructed to

This regains the advantage of sockets surviving restarts.

I'm not sure how to interpret this one but today, despite having a copy of the file descriptors in MGT and the ability to "survive" restarts, we reopen them regardless.

One advantage of moving the file descriptors to the cache process is that sockstat and other similar programs report that listen sockets are owned by the cache process, while the CLI sockets are owned by MGT, showing a more accurate picture.

Then add CLI commands to dynamically manage sockets/traffic

The issue of naming sockets in VCL is not really different from naming stevedores, but we need to decide if a socket is "abstract" or if it is it's address, ie: > if you can:
  socket.close prod_sock
  socket.open prod_sock some_new_address

While I very much agree with the idea of managing listen sockets dynamically, turning varnishd -a options into a shorthand for a CLI equivalent (like varnishd -f is a shorthand for vcl.load) I think we should refrain from moving too fast.

Why not first agree on a CLI definition for traffic.* or socket.* commands and start with this all-or-nothing subset?

Also as mentioned, "traffic." sounds to me like the place bandwidth limiting and steering lives, and if we think we might go there, then maybe sockets should live there too. Otherwise I think socket. communicates better what goes on.

No strong opinion on this one. As a reminder, this goes beyond closing sockets as we allow ongoing client traffic to complete and only bar new requests. I think we can still move from the all-or-nothing model to per-socket management without difficulty once we cross that bridge.

Still, I would rather see us move incrementally there.

Aug 16 '23 11:08 dridi

bugwash:

no consensus on fd passing outside of the CLI pipe (that would be turned into a socket pair)
- an option is to depart from VCLI between mgt and cache processes
we'll agree on CLI commands and later decide whether to proceed incrementally or not

Aug 28 '23 15:08 dridi

Based on the discussions so far, here is the list of commands we would ultimately aim for.

--->8----->8----->8----->8----->8----->8----->8----->8----->8----->8----->8---

socket.open [name [address]]

With no argument, this would be equivalent to traffic.accept from this patch series, opening all listen sockets and accepting new traffic from them.

When name is provided, only open the listen sockets for this specific listen_arg.

Opening an open socket would probably be a no-op. This is how traffic.accept behaves in this patch series.

When both name and address are provided, the listen_arg is updated to address and all its listen sockets are opened.

If name was already opened, opening it with a new address could be a failure scenario.

socket.close [name]

With no argument, this would be equivalent to traffic.refuse from this patch series, closing all listen sockets and refusing new traffic from ongoing sessions.

When name is provided, only close the listen sockets for this specific listen_arg, and refuse new traffic for ongoing sessions only for this specific listen_arg.

Closing a closed socket would probably be a no-op. This is how traffic.refuse behaves in this patch series.

socket.list

List all listen sockets, their status (open/close) and probably other things like reference counters. This would replace traffic.status from this patch series, but reporting at a much finer granularity.

socket.add address [name [args...]]

The varnishd -a option would become a shorthand of this command, similarly to how varnishd -f is a shorthand for vcl.load.

In the absence of a name, the usual a0, a1 etc naming convention would still apply.

The optional args could translate directly from the current -a "subopts":

varnishd -a local=/var/run/varnish.sock,HTTP,user=varnish,group=varnish,mode=440

socket.add /var/run/varnish.sock local HTTP user=varnish group=varnish mode=440

New sockets are added open by default.

socket.discard name

Close and remove all listen sockets for the listen_arg called name. Listen sockets end up on a cooling list until there reference counters drop to zero. Probably part of the CLI "before polling" in the cache process.

---8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<-----8<---

The socket.discard command is the only one to not take an optional socket name, to avoid accidentally discarding all sockets. For the same reason I didn't suggest querying socket names with globs like we do in other places. And globs have the advantage of being compatible with raw values, so operating on globs (where it is relevant) is something we could add on top without breaking the behaviors described above.

Once we reach consensus, I can turn this into a VIP.

Aug 28 '23 16:08 dridi

LGTM. Maybe mention the possibility to not specify any listen_arg during startup and instead use -I clifile ?

-I clifile
       Execute the management commands in the file given as clifile before the the worker process starts, see CLI Command File.

Aug 29 '23 07:08 asadsa92

Maybe mention the possibility to not specify any listen_arg during startup and instead use -I clifile ?

That was implied.

I didn't want to focus on how we'd do this. For example for -f we have a special rule with an empty argument. How -a would fit in this model is a different question to answer. Again, going incrementally means that we can start with the current rules of requiring at least one -a and then being able to add or discard more. We may decide to never have zero listen_arg, like we can never have no VCL once the first one is loaded.

Aug 29 '23 07:08 dridi

To begin my review of the proposed interface, I would like to recap the current -a argument syntax from varnishd(1):

-a <[name=][listen_address[,PROTO]]>
-a <[name=][ip_address][:port][,PROTO]>
-a <[name=][path][,PROTO][,user=name][,group=name][,mode=octal]>

Path can now also be @abstract.

I see some shortcomings with this syntax:

there is no room for protocol options
the UDS options belong to path, but come after PROTO, which does not seem ideal
, might not be the best separator, because it is also a valid character for paths and abstract sockets.

With this in mind, I would like to discuss the proposed CLI syntax. And before anyone shouts "but socket.add", please hear me out, we'll get there.

socket.open [name [address]]

I am missing a clarfication where the existing PROTO and path (UDS) options go.

Also I would like to take the opportunity of a new CLI definition to introduce an acceptor type (implementation), analogous to the stevedore implementation:

We are not there yet and we do not have a precedent, but I am working on alternative acceptor interfaces. These would be implemented as extensions. We could add a registration interface similar to the stevedore interface, and the CLI could offer the option to select the implementation.

All in all, I would propose this syntax:

  socket.open [name					\
              [<address|path>[:<port|options[,...]>]	\
              [protocol[,options[,...]]			\
	      [acceptor[,options]]]]]

So for UDS, the user, group and mode options would come after the path, separated by : like for a port. IMHO this separator would be a sensible choice, as it is already deprecated for use in paths due to use for network paths (host:/path) and the PATH variable separator. There are probably more cases.

The protocol would gain options. "HTTP" would remain the default.

In a first step, the only (and default) acceptor would be default, the currently only acceptor. In a second step, the waiter configuration could be moved to an acceptor option.

Some examples with this proposed syntax

listen on IN_ADDR_ANY:http:

  socket.open name 0.0.0.0

IPv6 with port:

  socket.open name [::1]:8080

UDS / abstract:

  socket.open name /tmp/sock:user=varnish,mode=0777
  socket.open name @kandinsky

Protocol with options (tell proxy protocol the next layer protocol):

  socket.open name @tls PROXY,HTTP

Acceptor/waiter selection:

  socket.open name 0.0.0.0 HTTP default,kqueue
  socket.open name 0.0.0.0 HTTP async,depth=64

With no argument, [...] If name was already opened, opening it with a new address could be a failure scenario.

Overall, I agree with these semantics, and worst case they could be changed more easily than the syntax. Only regarding the last point I wonder if it could not be a good idea to change parameters without ever closing a socket, so I would tend to a declarative "socket.open applies the settings also to already open sockets".

socket.close [name]

The way I understand the main intention of this PR, we still want to facilitate traffic control, despite having moved to socket operations.

So the two phases of a socket shutdown discussed here are

close the socket
send a Connection: close equivalent on all existing sessions

For the latter, VCL or other code would need to query the status, , obviously.

But should there not be a phase even before the socket closure to allow other cluster members, load balancers or DNS controllers to even more smoothly prepare for the shutdown? I would think that, for example, starting to send negative replies to health probes could be a good idea, like with this dummy vcl::

  if (req.url == "/healthy") {
    if (std.socketstatus(socket.a0) == "open") {
      return (synth(200));
    } else {
      return (synth(404));
    }
  }

Thus, I propose a grace argument to socket.close

socket.close [grace [name]]

Upon reception of the command, varnish would set the socket to grace status but not yet close it and set a timer. When that expires, it would actually close the socket(s).

socket.add address [name [args...]]

I do not understand why we need socket.add, I would think that socket.open would suffice, in particular because you write "New sockets are added open by default". How would we then get the non-default?

I would see a point, however, in having socket.add be a socket.open without the actual open.

Aug 30 '23 09:08 nigoroll

Thanks for the extensive review!

So the two phases of a socket shutdown discussed here are

close the socket

send a Connection: close equivalent on all existing sessions

For the latter, VCL or other code would need to query the status, , obviously.

It's not so much a two-phase operation, but rather one active operation (closing) and then passively reacting to the closed session. When we close the listen sockets in this patch series, there are steps in the HTTP/1 and h2 state machines where we check the status to either prevent HTTP/1 keep-alive or send a GOAWAY frame.

On a per-socket basis it would be the same, but the status check would be different (for example finding the listen_sock from the task's sp).

I'm not arguing against VCL support for this, only saying that I don't expect it as a day-1 requirement and rather something we can easily add. I'm however arguing for having the core code check the status and prevent subsequent requests on sessions that have their listen sockets closed.

, might not be the best separator, because it is also a valid character for paths and abstract sockets.

For what it's worth it's just a VAV, so both spaces and commas act as separators. I haven't delved into VAV code in a while but I believe we can dquote arguments. It of course comes with the inconvenience that your shell would require nested quoting, but I believe this is perfectly doable today.

With this in mind, I would like to discuss the proposed CLI syntax. And before anyone shouts "but socket.add", please hear me out, we'll get there.
socket.open [name [address]]
I am missing a clarfication where the existing PROTO and path (UDS) options go.

By address here I mean either IP/host, path and abstract, maybe endpoint was a better word for this. The reason why there were no options in the first place was precisely socket.add, and I will not quote the entire text around socket.open. I have read it though, so I'm not shouting socket.add but you have convinced me that socket.open should only be concerned with opening sockets. We should not conflate the two use cases.

If we want to change something about a listening socket (and again, this is probably not the priority and something that could land later) I would rather go with a separate command:

socket.mod name endpoint [options...]

I'm thinking socket.add should have the same signature? Maybe the automatic a%d name should remain as varnishd -a default behavior? At least that would make the commands simpler to grasp:

socket.open [name]
socket.close [name]
socket.list
socket.add name endpoint [options...]
socket.mod name endpoint [options...]
socket.discard name

We can even make the protocol a special case since it doesn't depend (yet?) on the endpoint type:

socket.add name endpoint [-p proto] [options...]
socket.mod name endpoint [-p proto] [options...]

At the very least I don't like the idea of shoving options in the endpoint:

UDS / abstract:

  socket.open name /tmp/sock:user=varnish,mode=0777
  socket.open name @kandinsky

Likewise, I don't like the idea of using commas as separator in the CLI, we already have blanks to delimit arguments:

Acceptor/waiter selection:

  socket.open name 0.0.0.0 HTTP default,kqueue
  socket.open name 0.0.0.0 HTTP async,depth=64

(but I do like the idea of per-socket acceptor tuning, like the backlog)

I do not understand why we need socket.add, I would think that socket.open would suffice, in particular because you write "New sockets are added open by default". How would we then get the non-default?

I suppose I answered this partly already, but I would like to add that this would be a similar dichotomy as vcl.load and vcl.use.

To get the non-default, we can always add an option on the CLI side. The varnishd -a would be able to add a closed socket, similarly to how varnishd -f does both vcl.load and vcl.use. We get a better control over the CLI.

I would see a point, however, in having socket.add be a socket.open without the actual open.

Let's try one more time then:

socket.open [name]
socket.close [name]
socket.list
socket.add [-p proto] [-c] name endpoint [options...]
socket.mod [-p proto] name endpoint [options...]
socket.discard name

Where the -c option would stand for closed.

Aug 30 '23 10:08 dridi

I'm not arguing against VCL support for this

I would think VCL should always be in control. But that's an implementation question, this conversation is regarding a CLI specification and I agree that does not need to be in the day-1 implementation.

At any rate, this was only setting the stage for proposing a grace argument to socket.close, and it seems like you did not comment on that. Would you maybe have another look?

, might not be the best separator, because it is also a valid character for paths and abstract sockets.

For what it's worth it's just a VAV

OK, good point, I did not look at the implementation.

Let's try one more time then:

socket.open [name]
socket.close [name]
socket.list
socket.add [-p proto] [-c] name endpoint [endpoint_options...]
socket.mod [-p proto] name endpoint [endpoint_options...]
socket.discard name

Where the -c option would stand for closed.

(edit: renamed options to endpoint_options for clarity)

The add/mod/open idea in combination with -c looks neat to me and also easy to understand. Maybe we can spoil mod with an ify suffix, but that's just my bikeshed.

Also I am fine with named options instead of optional positional arguments.

What is more important to me is that some points I raised still seem to not be reflected.

you say you do not like options combined with arguments, so where are the proto_options?
Even if we do not implement it straight away, can we please add the specs for acceptor selection and options now?
I am missing the proposed grace option to socket.close? if you prefer, I am fine with socket.close [-g timespec] [name].

Aug 30 '23 11:08 nigoroll

What is more important to me is that some points I raised still seem to not be reflected.

My bad, I took things out of order and forgot some.

you say you do not like options combined with arguments, so where are the proto_options?

You are referring to this, right?

Protocol with options (tell proxy protocol the next layer protocol):
  socket.open name @tls PROXY,HTTP

This is not something possible today, right now this is one or the other. But I understand you have ambitions to use PROXY protocol for something other than HTTP. If that had to turn into a list of protocols, then why not? For example PROXY,HTTPS?

If we ever cross that bridge, I'd rather make PROXY protocol orthogonal to the application protocol. I worry that we may be looking a little too far ahead in this case.

Even if we do not implement it straight away, can we please add the specs for acceptor selection and options now?

Either we consistently give options a name like we do for UDS, or we grow new options:

socket.open name 0.0.0.0 HTTP acceptor=default waiter=kqueue

    socket.add -a default -w kqueue -p HTTP name 0.0.0.0 acceptor=default waiter=kqueue

socket.open name 0.0.0.0 HTTP acceptor=async listen_depth=64

    socket.add -a async,listen_depth=64 -p HTTP name 0.0.0.0
    socket.add -a async -p HTTP name 0.0.0.0 listen_depth=64

I am missing the proposed grace option to socket.close? if you prefer, I am fine with socket.close [-g timespec] [name].

The problem is that I don't understand.

This patch series closes listen sockets upon traffic.refuse, not connection sockets. So in a sense ongoing sessions are already graced, being allowed to complete.

Aug 30 '23 12:08 dridi

you say you do not like options combined with arguments, so where are the proto_options?

You are referring to this, right?
Protocol with options (tell proxy protocol the next layer protocol):
  socket.open name @tls PROXY,HTTP
This is not something possible today, right now this is one or the other. But I understand you have ambitions to use PROXY protocol for something other than HTTP. If that had to turn into a list of protocols, then why not? For example PROXY,HTTPS?

So first of all, the example I chose was probably rather bad. Yes, having a list of protocols could be an option, but what I meant was that the PROXY protocol would take HTTP as an option.

Maybe a better example could be PROXY,maxlen=1KB?

At any rate, when we design an interface, I would like to see where such options have a place.

Either we consistently give options a name like we do for UDS, or we grow new options:

socket.open name 0.0.0.0 HTTP acceptor=default waiter=kqueue

    socket.add -a default -w kqueue -p HTTP name 0.0.0.0 acceptor=default waiter=kqueue

socket.open name 0.0.0.0 HTTP acceptor=async listen_depth=64

    socket.add -a async,listen_depth=64 -p HTTP name 0.0.0.0
    socket.add -a async -p HTTP name 0.0.0.0 listen_depth=64

Can we please design an interface where it is clear what is an endpoint option, what is a protocol option and what is an acceptor option? It's OK if you do not like my optional positional arguments with comma separated options, but the above does not look like a valid alternative to me.

I am missing the proposed grace option to socket.close? if you prefer, I am fine with socket.close [-g timespec] [name].

The problem is that I don't understand.

This patch series closes listen sockets upon traffic.refuse, not connection sockets. So in a sense ongoing sessions are already graced, being allowed to complete.

How would that work? Would some instance outside Varnish look after that? If yes, why not integrate it?

Aug 30 '23 14:08 nigoroll

So here's a concrete proposal:

socket.open [name]
socket.close [-g grace] [name]
socket.list
socket.add [-c] [-p proto [-o proto_option ...]] [-a acceptor [-o acceptor_option ...]] name endpoint [endpoint_options...]
socket.mod [-p proto [-o proto_option ...]] [-a acceptor [-o acceptor_option ...]] name endpoint [endpoint_options...]
socket.discard name

Aug 30 '23 14:08 nigoroll

@bsdphk @Dridi and myself are happy with the previous note now, we clarified some final points on IRC.

Aug 30 '23 14:08 nigoroll

We discussed this on IRC this morning and settled on a minimal set of commands, see the first revision in VIP 33.

https://github.com/varnishcache/varnish-cache/wiki/VIP33:-Socket.*-CLI-commands

Sep 04 '23 09:09 dridi

I think the current revision of https://github.com/varnishcache/varnish-cache/wiki/VIP33%3A-Socket.%2A-CLI-commands could use polish of the socket.list output. It was important to @Dridi to get away from the comma separated options, so I think the output should look something like this:

socket.list
  Output:
     name state fd bound_address endpoint endpoint_options proto proto_options acceptor acceptor_options

If we want to get away from an options separator and use whitespace, we will probably need to use quotes, as in this potential example:

     a0 open 8 0.0.0.0:3939 :0 - PROXY - default "epoll depth=64"

In general, I would prefer if we could also have an agreement on the future interface design - with the option to change it for good reasons.

I started an edit in https://etherpad.wikimedia.org/p/vip33 . Here's where I left off:

First revision

socket.open
  Attempts to open all configured incoming sockets (= -a arguments)
  Success if they all open
  Failure if any fail to open, all sockets left closed.

socket.close
  Closes all configured incoming sockets
  Always success.

socket.list
  Output:
     name state fd bound_address endpoint endpoint_options proto proto_options acceptor acceptor_options

  for instance when open:
     a0 open 8 0.0.0.0:3939 :0 - PROXY - default "epoll depth=64"
     a0 open 9 [::]:4029 :0 - PROXY - default epoll

  example when closed:
     a0 closed - - :0 - PROXY - default epoll

This will be the initial behavior without arguments, and commands will be refined when optional arguments are defined.

Future expansion of the interface

This interface is foreseen to evolve. The curent proposal for the future direction is:

socket.open [name]
socket.close [-g grace] [name]
socket.list
socket.add [-c] [-p proto [-o proto_option ...]] [-a acceptor [-o acceptor_option ...]] name endpoint [endpoint_options...]
socket.mod [-p proto [-o proto_option ...]] [-a acceptor [-o acceptor_option ...]] name endpoint [endpoint_options...]
socket.discard name

See https://github.com/varnishcache/varnish-cache/pull/3959 for discussions.

Sep 04 '23 14:09 nigoroll

Note from VDD: socket.add would probably take -t argument instead of -a and -p

Sep 27 '23 09:09 nigoroll