varnish-cache Implementing the Cache-Control "must-revalidate" directive

This PR implements must-revalidate based on the spec, stating that when must-revalidate is part of the Cache-Control header, no stale data should be served from cache.

In terms of Varnish, this means grace is set to zero.

Example header

This is what such a header could look like:

Cache-Control: public, max-age=3600, must-revalidate

Combining `must-revalidate` and `stale-while-revalidate`

By putting this if-clause behind the stale-while-revalidate check, any occurrence of must-revalidate will supercede stale-while-revalidate.

If for some weird reason, the backend would send back both directives, grace would be set to zero, as illustrated in the example below:

Cache-Control: public, max-age=3600, must-revalidate, stale-while-revalidate=30

This is probably a breaking change

Because this PR changes the behavior of how grace is set, this could be considered a breaking change.

If you would accept this patch, I'm not sure at what point this would end up in a future release of Varnish.

Jul 16 '20 08:07 ThijsFeryn

I added this pull request to #3246 ( 7.0 Compliance improvements)

Jul 16 '20 09:07 dridi

One thing that should be noted is the risk of serialization introduced by this change, negating the non-zero default_grace:

Cache-Control: public, must-revalidate

Such a change should get a fair share of red tape in the release notes and relevant documentation.

For this use case we would need a non-zero keep, and in the spirit of better compliance we could change the default timings:

default_ttl=0
default_grace=0
default_keep=2m

This way we could cache resources with no TTL based on the revalidation status (200 or 304). This would however be a major VCL breaking change since a zero TTL is interpreted today as uncacheable. To avoid serialization it would be necessary to wake up anyone in the waiting list regardless of the outcome (200 or 304) to be served with a fresh response.

Even a negative TTL shouldn't be interpreted as uncacheable (and more precisely unstorable) if we allow grace and keep.

That doesn't change my review, I wanted to mention the risks and where we could go if we dial this compliance knob up to eleven. I had mentioned some of that to @ThijsFeryn prior to his submission but didn't have time to elaborate here last week.

Jul 20 '20 10:07 dridi

the risk of serialization introduced by this change

Request serialization can only happen when you set a zero TTL. Having a zero grace value does not affect this, so its safe.

Jul 20 '20 12:07 rezan

Request serialization can only happen when you set a zero TTL. Having a zero grace value does not affect this, so its safe.

This is prevented with the current default_ttl and default_grace, but if you touch those defaults (as a Varnish user) and override the initial beresp.grace (directly from the backend) via the must-revalidate directive you introduce this risk. That's why I'm advocating for significant red tape in the docs because it can be difficult to diagnose.

This is what meant by "risk", we are pulling a rug under existing VCL.

And I guess I forgot some thoughts:

It can be easy to turn a cacheable response into a hitmiss object because the TTL is negative, especially in a tiered setup where Varnish's backend might be another Varnish instance serving a graced object. Making an object storable based on ttl+grace+keep would make this a non-issue.

The Cache-Control: public, must-revalidate serialization risk also goes away if ttl+grace+keep is enough to store a response since we have a usable cache insertion regardless of the 200 vs 304 outcome, which should be enough to wake clients on the waiting list.

Still, it breaks the assumption that a zero or negative TTL can be used to tell the built-in VCL and the core code that a response is uncacheable:

varnishtest "cache negative ttl"

server s1 {
        rxreq
        txresp -hdr {ETag: "123abc"} -body helloworld

        rxreq
        expect req.http.If-None-Match == {"123abc"}
        txresp -status 304 -hdr {ETag: "123abc"}
} -start

varnish v1 -vcl+backend {
        sub vcl_backend_response {
                set beresp.ttl = -1s;
                set beresp.grace = 0s;
                set beresp.keep = 2m;
                return (deliver);
        }
} -start

client c1 -repeat 2 {
        txreq
        rxresp
        expect resp.body == helloworld
} -run

Such a scenario would pass with this change in core code:

--- i/bin/varnishd/cache/cache_hash.c
+++ w/bin/varnishd/cache/cache_hash.c
@@ -435,7 +435,7 @@ HSH_Lookup(struct req *req, struct objcore **ocp, struct objcore **bocp)
                        continue;
                }
 
-               if (oc->ttl <= 0.)
+               if (oc->ttl + oc->grace + oc->keep <= 0.)
                        continue;
 
                if (BAN_CheckObject(wrk, oc, req)) {

And we'd need something similar in the built-in VCL.

That's it for my compliance detour, I hope I clarified my previous comment. I'm also fully aware of my lack of scrutiny, I haven't researched all the problems that this suggestion could introduce, for now I'm putting it on the proverbial table.

Jul 20 '20 13:07 dridi

Implementing the Cache-Control "must-revalidate" directive

Example header

Combining must-revalidate and stale-while-revalidate

This is probably a breaking change

Combining `must-revalidate` and `stale-while-revalidate`