vertx-http-proxy icon indicating copy to clipboard operation
vertx-http-proxy copied to clipboard

Support all HTTP Caching mechanisms

Open tsegismont opened this issue 1 year ago • 3 comments

Verify caching handles all cases defined in https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching

It seems the caching implementation takes into account resources when the backend responds with headers defined by the modern specs.

Some backends may use older caching headers.

tsegismont avatar Mar 15 '24 16:03 tsegismont

Following are suggested unimplemented improvements. Mandatory requirements from RFC9111 are marked with "must", others are just either suggested or optional. Ranked by importance.

1 Implicit caching

The current code only caches response with a public response directive, while a cache can also cache responses implicitly. In our case (a shared cache), the response is a cache candidate if one of the following applies:

  • The response has a Expires header field;
  • The response uses the public directive;
  • The response uses the max-age directive;
  • The response uses the s-maxage directive;
  • The heuristic freshness is used (see 8).

And if any of the following applies, the response should not be in the cache:

  • The response uses the no-store directive;
  • The response uses the private directive;
  • The response has a Authorization header field and not use a explicit caching (with must-revalidate / public / s-maxage response directives)

https://www.rfc-editor.org/rfc/rfc9111.html#section-3

2 Vary header

Based on the fields in the Vary header, the response could be different even with same URL and HTTP method. A cache must validate the response if the cached response doesn't match the requested Vary header.

The could lead to different implementations in the code. For example, we can remain the original code and only add a condition to check if the Vary header matches; Or we can put the headers related to the Vary header into the cache key; Or we can giving up on using LinkedHashMap as the cache data structure but use LinkedList, etc.

https://www.rfc-editor.org/rfc/rfc9111.html#section-4.1

3 Invalidating cache for unsafe request methods

Because unsafe request methods such as PUT, POST, or DELETE have the potential for changing state on the origin server, intervening caches are required to invalidate stored responses to keep their contents up to date. A cache must invalidate the target URI when it receives a non-error status code in response to an unsafe request method (including methods whose safety is unknown).

https://www.rfc-editor.org/rfc/rfc9111.html#section-4.4

4 Add validation for response

When a cached response is stale, it may use validation to check if it's still able to use. The current code already implemented the validation mechanism, but it only do validations for requests with max-age directive and do not apply it to stale caches.

Validation should also apply to response with the no-cache directive.

https://www.rfc-editor.org/rfc/rfc9111.html#section-4.3

5 Filtering the header

Current cache copies all the headers from the response. However, not all the header should be forwarded or cached:

  • (must) header Connection and fields listed in the Connection header should be removed;
  • (suggested) intermediaries should remove or replace fields that are known to require removal before forwarding:
    • Proxy-Connection
    • Keep-Alive
    • TE
    • Transfer-Encoding
    • Upgrade

https://www.rfc-editor.org/rfc/rfc9111.html#section-3.1

6 Unimplemented Directives

Current code already implemented the following directives:

  • max-age
  • public

Following are unimplemented:

request directives:

  • max-stale: allow client to accept stale responses younger than it;
  • min-fresh: client prefer response older than it;
  • no-cache: cache that must validate before use;
  • no-store: do not cache at all;
  • no-transform: do not allow transform contents (e.g. convert between image formats);
  • only-if-cached: only want cache, not from original server.

response directives (a cache must obey the Cache-Control directives defined here):

  • must-revalidate: cache that must validate after stale;
  • must-understand: not to cache if not understand requirements.
  • no-cache: cache that must be validated before use;
  • no-store: do not cache at all;
  • no-transform: do not allow transform contents (e.g. convert between image formats);
  • private: do not cache (for shared cache)
  • proxy-revalidate: same as must-revalidate (for shared cache)
  • s-maxage: same as max-age, and allow caching Authorization header (for shared cache)

https://www.rfc-editor.org/rfc/rfc9111.html#section-5.2

7 Partial content storing and combining

If the response uses Range specifiers, the cache may store incomplete responses. When the response is complete, the cache may combine a new response with one or more stored responses.

https://www.rfc-editor.org/rfc/rfc9111.html#section-3.3

https://www.rfc-editor.org/rfc/rfc9111.html#section-3.4

8 Heuristic freshness

If the response has a Last-Modified header field but no explicit expiration time, caches are encouraged to use a heuristic expiration value that is no more than some fraction of the interval since that time. A typical setting of this fraction might be 10%.

https://www.rfc-editor.org/rfc/rfc9111.html#section-4.2.2

9 Prevent overflow for delta seconds

If a cache receives a delta-seconds value greater than the greatest integer it can represent, or if any of its subsequent calculations overflows, the cache must consider the value to be 2147483648 (2^31) or the greatest positive integer it can conveniently represent.

https://www.rfc-editor.org/rfc/rfc9111.html#section-1.2.2

wzy1935 avatar Jul 09 '24 10:07 wzy1935

Thank you @wzy1935 for this detailed reported, great work!

It seems to me that the following items could be priorities for safety reasons:

  • 2 Vary header
  • 3 Invalidating cache for unsafe request methods
  • 5 Filtering the header
  • 9 Prevent overflow for delta seconds

What do you think?

8 Heuristic freshness seems like a low-hanging fruit, correct?

tsegismont avatar Jul 11 '24 09:07 tsegismont

Sure! And I can work on these that you mentioned.

wzy1935 avatar Jul 11 '24 11:07 wzy1935