varnish-cache Add req.filters and bereq.filters

Summary

These changes add support for request body filters on the client and backend side. The motivation for this feature will hopefully be obvious: To be able to transform or process "uploads" in varnish-cache without the need for a backend application.

VDPs and VFPs are used here "in reverse" to the existing use on response bodies: the client side fetches the request body, so it uses a VFPs, while the backend side delivers the request body to be backend side, so it uses VDPs.

Please read the individual commit messages. This description is intended so serve as an overview.

Preparation work

This PR is based on some minor changes which were pushed directly to trunk because they do not represent API changes, preparing VDP for use on the backend side, where we do not have a struct req:

81d406e1d8f03479408228accd5fa77748e7c1c1 renames some arguments to improve greppability for vdp_ctxes
3ca7b1d5a2c6c82cbae14c8926cc10b6941b14cd reduces use of struct req from VDPs to where absolutely necessary (client side only VDPs).
13cf51e70c00b912ce39110d7eff50ccc01b7bb9 renames (struct req).filter_list for consistency
b5d8e57138c488efcc47df6f41a150063f1cf178 & 8d0b37d65da9348cab1183e9169e8d377abc52cf make the http line VDP available to other code
083505193a5065fab0359568816b5e733b69edc3 adds an OBJ_ITER_END flag to the request body objiterate_f call

Overview of commits

The first commit generalizes the VDP API to make it suitable for use on the backend side. The main change here is to pass, for generic VDPs (those which are suitable for backend side use), the "outgoing" headers and a pointer to a length field, rather than using struct req. As is, this change serves clarity and code compaction only, all information is already present in the VRT_CTX argument, but the cl argument becomes important later on.

The next two commits add VDPs for bereq.body without introducing custom VDPs yet. That is, only the http1 VDP is pushed, keeping functionality otherwise. As a consequence of this change, fetches can now also fail for out-of-workspace for VDP.

Prepare VCL_StackVDP for backend use changes the implementation to work either on a struct req or struct busyobj, but not both.

Once this groundwork is laid, adding the actual fields and VRT interface for req.filters and bereq.filters is rather trivial. We stack the filters like elsewhere and add a test case.

To be done after merge

changelog

Jan 01 '24 12:01 nigoroll

If anyone is interested how the adjustments look like for external vmods:

Jan 02 '24 16:01 nigoroll

thanks a bunch @nigoroll, I've been wanting that for rers for a while!

Jan 02 '24 18:01 gquintard

I feel that instead of breaking so much of the API to enable it for *req.body we could instead enhance the vdp_ctx and add a bunch of fields (bo, hd and cl) and let processors check or use what they need from the context (similarly to vrt_ctx).

These questions are, in my mind, answered in the commit message of https://github.com/varnishcache/varnish-cache/pull/4035/commits/0274ebc254d8846cff4e265856cfa009d8a8bf0d:

We already have (had, from the perspective of the patch) a req member in vdp_ctx which is only implicitly scoped to VDP init time by being NULLed here.

I think the API should make it clear that VDPs should not mess with a request, a busy object, headers or the content length after init time, thus I think that we should change the init function signature and specifically not add vdp_ctx fields.

Also please note the following from the commit message regarding the motivation for this change:

Note that none of the new arguments would be necessary, they could all (including the existing oc argument) be derived from the VRT_CTX. The reasons for having these arguments are code clarity and avoidance of code duplication, e.g. for repeatedly figuring out the correct header and length pointer on the client vs. backend side.

I did actually try this for VDPs and ended up with repetitive code - you basically have to repeat the additions to VCL_StackVDP() in each VDP init function.

The changed API makes it easier (and, in my mind, clearer) how to write a VDP which works everywhere: You get the headers and the content length, make modifications to both as needed and be done. See for example the vmod_re change, which mostly consists of boilerplate changes to CHECK_OBJ_* the arguments. The meat of the change is *cl = -1, and that's it.

For VDPs which need objcore access or only work on the client side, I believe, the API change also clarifies the situation: The respective init function arguments are present only if the respective access is allowed.

For example, the ESI processor could assert(vdc->hd == req->resp) after grabbing reqand we wouldn't need to change much more.

Please look at the commit again, it also does not change much more: Besides the CHECK_OBJ_* boilerplate, the only real change is to fail if there is no req.

And that would open the pipe case, that I didn't see addressed in this patch series

I think we should work on one question at a time, and if pipe mode requires another API change, then so be it. But does it?

Jan 08 '24 14:01 nigoroll

There is a leak reported by ASAN:

***  v1    debug|Info: Child (73917) said Child dies
**** dT    10.444
***  v1    debug|Info: Child (73917) said 
***  v1    debug|Info: Child (73917) said =================================================================
***  v1    debug|Info: Child (73917) said ==73917==ERROR: LeakSanitizer: detected memory leaks
***  v1    debug|Info: Child (73917) said 
***  v1    debug|Info: Child (73917) said Direct leak of 8 byte(s) in 1 object(s) allocated from:
***  v1    debug|Info: Child (73917) said     #0 0x7fc5d007e9cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
***  v1    debug|Info: Child (73917) said     #1 0x7fc5c1c9c262 in xyzzy_vfp_rot13_init ../../../vmod/vmod_debug.c:118
***  v1    debug|Info: Child (73917) said     #2 0x557abdd183c5 in VDP_Push ../../../../bin/varnishd/cache/cache_deliver_proc.c:185
***  v1    debug|Info: Child (73917) said     #3 0x557abde4931b in VCL_StackVDP ../../../../bin/varnishd/cache/cache_vrt_filter.c:298
***  v1    debug|Info: Child (73917) said     #4 0x557abded605c in V1F_SendReq ../../../../bin/varnishd/http1/cache_http1_fetch.c:101
***  v1    debug|Info: Child (73917) said     #5 0x557abdcd2ad5 in vbe_dir_gethdrs ../../../../bin/varnishd/cache/cache_backend.c:304
***  v1    debug|Info: Child (73917) said     #6 0x557abdd1bb64 in VDI_GetHdr ../../../../bin/varnishd/cache/cache_director.c:147
***  v1    debug|Info: Child (73917) said     #7 0x557abdd5dcc1 in vbf_stp_startfetch ../../../../bin/varnishd/cache/cache_fetch.c:432
***  v1    debug|Info: Child (73917) said     #8 0x557abdd6d248 in vbf_fetch_thread ../../../../bin/varnishd/cache/cache_fetch.c:1109
***  v1    debug|Info: Child (73917) said     #9 0x557abde9c2c0 in Pool_Work_Thread ../../../../bin/varnishd/cache/cache_wrk.c:496
***  v1    debug|Info: Child (73917) said     #10 0x557abde9552c in WRK_Thread ../../../../bin/varnishd/cache/cache_wrk.c:157
***  v1    debug|Info: Child (73917) said     #11 0x557abde9cdbb in pool_thread ../../../../bin/varnishd/cache/cache_wrk.c:529
***  v1    debug|Info: Child (73917) said     #12 0x7fc5cf39d043  (/lib/x86_64-linux-gnu/libc.so.6+0x89043)
***  v1    debug|Info: Child (73917) said 
***  v1    debug|Info: Child (73917) said -----------------------------------------------------
***  v1    debug|Info: Child (73917) said Suppressions used:
***  v1    debug|Info: Child (73917) said   count      bytes template
***  v1    debug|Info: Child (73917) said       2        136 HSH_config
***  v1    debug|Info: Child (73917) said       1      16384 VSL_Setup
***  v1    debug|Info: Child (73917) said       2         64 WRK_BgThread
***  v1    debug|Info: Child (73917) said -----------------------------------------------------
***  v1    debug|Info: Child (73917) said 
***  v1    debug|Info: Child (73917) said SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).

Jan 15 '24 14:01 asadsa92

I have rebased, squashed the squashable commits and force-pushed .

Jan 31 '24 14:01 nigoroll

Obviously this is a good idea.

I have thought about the general shape of the filter-API, and checked some old notes, and I think I come down on the side of putting more stuff into the vdp_ctx rather than add more function arguments, the entire point about the vdp_ctx is to not have a million function arguments, and to not break the API, every time we have an idea.

But as @nigoroll rightly points out, that does carry the risk that all filter implementations will start with long list of asserts, if nothing else to clearly document the situation.

Some of that comes down to struct req having "job" stuff in it, because it does not always have a wrk, and that's fundamental.

But some of it can be solved with alias fields, for instance a struct http* which either points to resp->http or bereq->http which will allow many filters to not care about where they sit.

May 28 '24 06:05 bsdphk

Thank you for the review, @bsdphk .

As Dridi and you gave similar feedback on this aspect, I will obviously respect your verdict.

Yet one thing bothers me: If we simply added struct objcore *oc, struct req *req, struct http *hd, intmax_t *cl to struct vdp_ctx, half the members would only have significance for a VDP's init function, and, in fact, shuld not be used by the bytes function of any well behaved, generic filter (not using trailers (tbd)). So I would think that we'd set them to NULL as soon as the init function is done.

So, to improve API stability and still make it clear how a well behaved filter should work, I would suggest to introduce a struct vdp_init_ctx containing only the pointers which any well behaved VDP init should use.

Note that any filter which "knows what it is doing" could still use the VRT_CTX to get hold of whatever pointers it needs.

Side note that in terms of "api stability": I can follow the argument that changing function signatures is particularly painful downstream, and also struct members can more easily be added in a backwards compatible manner. But to achieve full binary compatibility, we would need to start versioning structures and/or add a bitfield with supported features. It was and is my understanding that, with Varnish-Cache, we do not want to take this route. So moving the problem to structs will only help us so far and comes with its own challenges.

May 28 '24 07:05 nigoroll

I deliberately said "API" and not "ABI": Recompilation is cheap, stuffing source-code with #ifdef is not.

Neither I nor Occam, share your passion for "multiplying entities" and I really do not see what possible benefit a dedicated vdp_init_ctx would bring ?

May 28 '24 08:05 bsdphk

I really do not see what possible benefit a dedicated vdp_init_ctx would bring ?

As I said: avoid code duplication.

I think all a generic vdp init should be concerned about is:

what are the headers we are going to send
do we know a content-length? am I going to change it and if yes, to a known value or not?

A less generic vdp is going to want to know

am I operating on an objcore? (e.g. can I use gzip magic)

None of these has any relevance once the init function is done.

The req can go, it can be taken from VRT_CTX

May 28 '24 08:05 nigoroll

I'll prepare a diff to look at...

May 28 '24 08:05 nigoroll

FWIW my goal was to take your change and make an attempt at implementing it with limited (or if possible no) changes to the function signatures. I haven't had a chance to spend time on this.

generic filter (not using trailers (tbd))

FYI we are making progress on a limited initial support for trailers, and we should have something ready in a matter of weeks from the look of it. As per https://github.com/varnishcache/varnish-cache/pull/3809#issuecomment-1201245852 we are moving trailers out of the body filtering code.

May 28 '24 09:05 dridi

FYI we are making progress on a limited initial support for trailers

In case you missed it, we already had that in #2477. Based on the experience with this implementation, I think we would need to first agree to two prerequisites to not create another dead end:

implement a vcl-initiated fetch (where vcl_backend_fetch continues after fetching the body) and show how the resulting issues are being solved (e.g. modifications of headers which are not announced as trailers, how is this going to be handled on the object storage level)
generalize the chunk handling as I have attempted in tickets linked from #2477

May 28 '24 09:05 nigoroll

I think we would need to first agree to two prerequisites to not create another dead end

We can have that discussion in #3809 if you want. The short answer is that we want to take this iteratively and not involve VCL support on day 1 to get something useful sooner.

May 28 '24 12:05 dridi

Based on https://github.com/varnishcache/varnish-cache/pull/4035#issuecomment-2134676121 I revisited this PR and implemented two alternative approaches on changing the VDP API to accommodate usage on the backend side. I would appreciate if reviewers read all of this comment before or while looking at the alternative suggestions, which are:

#4111
#4112

To facilitate reviews and judgement on the alternatives, I would like to reiterate over some aspects already mentioned. There will be some repetition, but I think it might be helpful to recapitulate in a holistic manner.

The essence of a filter is to modify a stream of bytes. What follows from this property alone is that the filter might have knowledge about the amount of data it will provide. For example, a filter might know that it will not add or remove any bytes, it might know that it will only return a fixed amount, or it might know that it does not know how many bytes it will return. So it makes sense to give the filter a length integer, which it can inspect and modify.

In the context of HTTP and filters, we also have headers sent *1) with the filtered bytes, so the filter should have access to headers to, for example, set the content-encoding or provide range information.

Finally, in the context of varnish-cache, a delivery filter might be operating directly on a cache object, in which case it might use metadata to optimize its function. For example, the gunzip filter can use metadata to return the decompressed length.

For now, this is what most of our filters need and use. I agree that we might find out in the future that some filters might need more, and we will cater for that.

To summarize this part, a VDP's init function should have access to (and, to be generic, limit itself to) using

length information,
the headers sent with this bytestream and
an optional objcore.

In my mind, a VDP init function should be called with these three pieces of information prepared, and it should not itself be tasked with having to find out the context in which it is called (that would be to use VRT_CTX only). The two draft PRs show two alternatives how to convey these to the vdp_init_f signature: Either in the existing struct vdp_ctx, or in astruct vdp_init_ctx. The argument for the latter is that the filter should use this information only during init, and so also for the former option, the respective struct vdp_ctx members are cleared after init is completed.

So far the question which option to prefer seems to be a matter of taste to some extent. I would argue that having a long(er) lived struct containing members which are only meant to be used briefly during init seems not to be the best option, but I can live with the other option, too.

Now let's change perspective and look at filters from the varnishd/vrt side:

VDP_Init() initializes the struct vdp_ctx
VCL_StackVDP() is the generic way to VDP_Push() filters set by VCL without any custom configuration.
VDP_Push() itself has a priv argument for custom filter configuration used, for example, by ESI.

To simplify use on the client and backend side, either VDP_Init() or VDP_StackVDP() now get a req and bo argument, which VDP_StackVDP() needs anyway to construct a VRT_CTX. For standalone usage of VDP_Push(), the caller is responsible for providing a VRT_CTX.

Again, I do not have that strong an opinion, but, from the varnishd/vrt side, the difference between the two options is to either only change VCL_StackVDP(), or VCL_StackVDP() and VDP_Init() .

What remains is that for all scenarios, we should change the interface anyway and

remove the req member from struct vdp_ctx, because VDPs should not be biased any more towards the client side and
remove the oc argument to vdp_init_f because it is just one of three things which all VDPs need (see first part of this comment)

I hope this helps to make a good decision.

*1) or in the case of a VFP, which we do not handle here, stored with the received object

May 29 '24 12:05 nigoroll