Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support all HTTP Caching mechanisms #70

Open
tsegismont opened this issue Mar 15, 2024 · 3 comments
Open

Support all HTTP Caching mechanisms #70

tsegismont opened this issue Mar 15, 2024 · 3 comments
Labels
enhancement New feature or request gsoc2024

Comments

@tsegismont
Copy link
Contributor

Verify caching handles all cases defined in https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching

It seems the caching implementation takes into account resources when the backend responds with headers defined by the modern specs.

Some backends may use older caching headers.

@tsegismont tsegismont added enhancement New feature or request gsoc2024 labels Mar 15, 2024
@wzy1935
Copy link
Contributor

wzy1935 commented Jul 9, 2024

Following are suggested unimplemented improvements. Mandatory requirements from RFC9111 are marked with "must", others are just either suggested or optional. Ranked by importance.

1 Implicit caching

The current code only caches response with a public response directive, while a cache can also cache responses implicitly. In our case (a shared cache), the response is a cache candidate if one of the following applies:

  • The response has a Expires header field;
  • The response uses the public directive;
  • The response uses the max-age directive;
  • The response uses the s-maxage directive;
  • The heuristic freshness is used (see 8).

And if any of the following applies, the response should not be in the cache:

  • The response uses the no-store directive;
  • The response uses the private directive;
  • The response has a Authorization header field and not use a explicit caching (with must-revalidate / public / s-maxage response directives)

https://www.rfc-editor.org/rfc/rfc9111.html#section-3

2 Vary header

Based on the fields in the Vary header, the response could be different even with same URL and HTTP method. A cache must validate the response if the cached response doesn't match the requested Vary header.

The could lead to different implementations in the code. For example, we can remain the original code and only add a condition to check if the Vary header matches; Or we can put the headers related to the Vary header into the cache key; Or we can giving up on using LinkedHashMap as the cache data structure but use LinkedList, etc.

https://www.rfc-editor.org/rfc/rfc9111.html#section-4.1

3 Invalidating cache for unsafe request methods

Because unsafe request methods such as PUT, POST, or DELETE have the potential for changing state on the origin server, intervening caches are required to invalidate stored responses to keep their contents up to date. A cache must invalidate the target URI when it receives a non-error status code in response to an unsafe request method (including methods whose safety is unknown).

https://www.rfc-editor.org/rfc/rfc9111.html#section-4.4

4 Add validation for response

When a cached response is stale, it may use validation to check if it's still able to use. The current code already implemented the validation mechanism, but it only do validations for requests with max-age directive and do not apply it to stale caches.

Validation should also apply to response with the no-cache directive.

https://www.rfc-editor.org/rfc/rfc9111.html#section-4.3

5 Filtering the header

Current cache copies all the headers from the response. However, not all the header should be forwarded or cached:

  • (must) header Connection and fields listed in the Connection header should be removed;
  • (suggested) intermediaries should remove or replace fields that are known to require removal before forwarding:
    • Proxy-Connection
    • Keep-Alive
    • TE
    • Transfer-Encoding
    • Upgrade

https://www.rfc-editor.org/rfc/rfc9111.html#section-3.1

6 Unimplemented Directives

Current code already implemented the following directives:

  • max-age
  • public

Following are unimplemented:

request directives:

  • max-stale: allow client to accept stale responses younger than it;
  • min-fresh: client prefer response older than it;
  • no-cache: cache that must validate before use;
  • no-store: do not cache at all;
  • no-transform: do not allow transform contents (e.g. convert between image formats);
  • only-if-cached: only want cache, not from original server.

response directives (a cache must obey the Cache-Control directives defined here):

  • must-revalidate: cache that must validate after stale;
  • must-understand: not to cache if not understand requirements.
  • no-cache: cache that must be validated before use;
  • no-store: do not cache at all;
  • no-transform: do not allow transform contents (e.g. convert between image formats);
  • private: do not cache (for shared cache)
  • proxy-revalidate: same as must-revalidate (for shared cache)
  • s-maxage: same as max-age, and allow caching Authorization header (for shared cache)

https://www.rfc-editor.org/rfc/rfc9111.html#section-5.2

7 Partial content storing and combining

If the response uses Range specifiers, the cache may store incomplete responses. When the response is complete, the cache may combine a new response with one or more stored responses.

https://www.rfc-editor.org/rfc/rfc9111.html#section-3.3

https://www.rfc-editor.org/rfc/rfc9111.html#section-3.4

8 Heuristic freshness

If the response has a Last-Modified header field but no explicit expiration time, caches are encouraged to use a heuristic expiration value that is no more than some fraction of the interval since that time. A typical setting of this fraction might be 10%.

https://www.rfc-editor.org/rfc/rfc9111.html#section-4.2.2

9 Prevent overflow for delta seconds

If a cache receives a delta-seconds value greater than the greatest integer it can represent, or if any of its subsequent calculations overflows, the cache must consider the value to be 2147483648 (2^31) or the greatest positive integer it can conveniently represent.

https://www.rfc-editor.org/rfc/rfc9111.html#section-1.2.2

@tsegismont
Copy link
Contributor Author

Thank you @wzy1935 for this detailed reported, great work!

It seems to me that the following items could be priorities for safety reasons:

  • 2 Vary header
  • 3 Invalidating cache for unsafe request methods
  • 5 Filtering the header
  • 9 Prevent overflow for delta seconds

What do you think?

8 Heuristic freshness seems like a low-hanging fruit, correct?

@wzy1935
Copy link
Contributor

wzy1935 commented Jul 11, 2024

Sure! And I can work on these that you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gsoc2024
Projects
None yet
Development

No branches or pull requests

2 participants