Support all HTTP Caching mechanisms
Verify caching handles all cases defined in https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching
It seems the caching implementation takes into account resources when the backend responds with headers defined by the modern specs.
Some backends may use older caching headers.
Following are suggested unimplemented improvements. Mandatory requirements from RFC9111 are marked with "must", others are just either suggested or optional. Ranked by importance.
1 Implicit caching
The current code only caches response with a public response directive, while a cache can also cache responses implicitly. In our case (a shared cache), the response is a cache candidate if one of the following applies:
- The response has a
Expiresheader field; - The response uses the
publicdirective; - The response uses the
max-agedirective; - The response uses the
s-maxagedirective; - The heuristic freshness is used (see 8).
And if any of the following applies, the response should not be in the cache:
- The response uses the
no-storedirective; - The response uses the
privatedirective; - The response has a
Authorizationheader field and not use a explicit caching (withmust-revalidate/public/s-maxageresponse directives)
https://www.rfc-editor.org/rfc/rfc9111.html#section-3
2 Vary header
Based on the fields in the Vary header, the response could be different even with same URL and HTTP method. A cache must validate the response if the cached response doesn't match the requested Vary header.
The could lead to different implementations in the code. For example, we can remain the original code and only add a condition to check if the Vary header matches; Or we can put the headers related to the Vary header into the cache key; Or we can giving up on using LinkedHashMap as the cache data structure but use LinkedList, etc.
https://www.rfc-editor.org/rfc/rfc9111.html#section-4.1
3 Invalidating cache for unsafe request methods
Because unsafe request methods such as PUT, POST, or DELETE have the potential for changing state on the origin server, intervening caches are required to invalidate stored responses to keep their contents up to date. A cache must invalidate the target URI when it receives a non-error status code in response to an unsafe request method (including methods whose safety is unknown).
https://www.rfc-editor.org/rfc/rfc9111.html#section-4.4
4 Add validation for response
When a cached response is stale, it may use validation to check if it's still able to use. The current code already implemented the validation mechanism, but it only do validations for requests with max-age directive and do not apply it to stale caches.
Validation should also apply to response with the no-cache directive.
https://www.rfc-editor.org/rfc/rfc9111.html#section-4.3
5 Filtering the header
Current cache copies all the headers from the response. However, not all the header should be forwarded or cached:
- (must) header
Connectionand fields listed in theConnectionheader should be removed; - (suggested) intermediaries should remove or replace fields that are known to require removal before forwarding:
- Proxy-Connection
- Keep-Alive
- TE
- Transfer-Encoding
- Upgrade
https://www.rfc-editor.org/rfc/rfc9111.html#section-3.1
6 Unimplemented Directives
Current code already implemented the following directives:
- max-age
- public
Following are unimplemented:
request directives:
- max-stale: allow client to accept stale responses younger than it;
- min-fresh: client prefer response older than it;
- no-cache: cache that must validate before use;
- no-store: do not cache at all;
- no-transform: do not allow transform contents (e.g. convert between image formats);
- only-if-cached: only want cache, not from original server.
response directives (a cache must obey the Cache-Control directives defined here):
- must-revalidate: cache that must validate after stale;
- must-understand: not to cache if not understand requirements.
- no-cache: cache that must be validated before use;
- no-store: do not cache at all;
- no-transform: do not allow transform contents (e.g. convert between image formats);
- private: do not cache (for shared cache)
- proxy-revalidate: same as must-revalidate (for shared cache)
- s-maxage: same as max-age, and allow caching Authorization header (for shared cache)
https://www.rfc-editor.org/rfc/rfc9111.html#section-5.2
7 Partial content storing and combining
If the response uses Range specifiers, the cache may store incomplete responses. When the response is complete, the cache may combine a new response with one or more stored responses.
https://www.rfc-editor.org/rfc/rfc9111.html#section-3.3
https://www.rfc-editor.org/rfc/rfc9111.html#section-3.4
8 Heuristic freshness
If the response has a Last-Modified header field but no explicit expiration time, caches are encouraged to use a heuristic expiration value that is no more than some fraction of the interval since that time. A typical setting of this fraction might be 10%.
https://www.rfc-editor.org/rfc/rfc9111.html#section-4.2.2
9 Prevent overflow for delta seconds
If a cache receives a delta-seconds value greater than the greatest integer it can represent, or if any of its subsequent calculations overflows, the cache must consider the value to be 2147483648 (2^31) or the greatest positive integer it can conveniently represent.
https://www.rfc-editor.org/rfc/rfc9111.html#section-1.2.2
Thank you @wzy1935 for this detailed reported, great work!
It seems to me that the following items could be priorities for safety reasons:
- 2 Vary header
- 3 Invalidating cache for unsafe request methods
- 5 Filtering the header
- 9 Prevent overflow for delta seconds
What do you think?
8 Heuristic freshness seems like a low-hanging fruit, correct?
Sure! And I can work on these that you mentioned.