HAProxy Caching web content

Caching web content with HAProxy is pretty difficult

I essentially need HAProxy to cache small web responses for a period of time. I am thinking NGINX is the issue here, but I’m not sure.

I am serving static files from NGINX with HAProxy sitting in front. I am not wanting to cache stuff with nginx. This is more of a general, learn-how-to cache stuff with HAProxy effectively, and I would want to be able to apply this anywhere. I would use Nuster, but it’s too old and I sort of need to use the newer HAProxy versions.

Expected behavior:
User requests example com/file.html

file.html cached in HAProxy memory for x seconds if not already cached

If file was already cached in HAProxy memory, it should send status 200OK with item if it was not in users browser cache, or status 304 if the browser did not request the entire item, but rather just a modification validation

HAProxy to only serve that file from memory cache until it expires, and make it impossible no matter what headers to bypass the HAProxy memory cache

The issue:
I cannot find a way for HAProxy to do this entirely. The web browsers send:
pragma, cache-control, if-modified-since, if-none-match etc. headers

When I remove all of those headers with http-request, everything caches properly in memory and it even sends the in-memory cache like it’s supposed to, except it does not send 304 to the browser if a simple refresh is done, it sends the entire object. I think I’m a little confused as to what http-request vs http-response cache headers do on the backend, user end and HAProxy and how HAProxy is interpreting these headers and deciding when/how to cache stuff.

Do I need to del a header or do some action like send 304 if cache-control header is cache-control: max-age=0?

Should or do I, or is it even possible to use a stick table to store an items cache status and just send 304 if the browser hasn’t request the full item? I’m going for the most minimalistic way to do this. I just need a very lightweight software like haproxy that proxies and caches a couple of things from nginx or a webserver, and I am not open to using Varnish, nginx or something else to cache content further back.

I’m just a little bit lost, so I would appreciate any guidance whatsoever.

I am using HAProxy 2.8 dev4

Quick disclaimer: I’m a little rusty on the details of caching, but this is my understanding. Take it with a grain of salt.

When HAProxy receives a GET request for something that’s in cache, it responds with that object without contacting the backend provided that none of these limitations have been hit.

One of the headers you’re removing is If-Modified-Since which tells HAProxy how long the browser has had an item cached. Without this header, HAProxy can only assume the browser’s cache has expired, and it returns 200 OK plus the item from it’s own cache. Try including that header and see if the browser requests start returning 304 instead. You’ve not included any of your config, so I’m assuming your cache is configured correctly otherwise.

Since HAProxy decides when to respond with cache and when not to using headers, the only way you can control this is to set them in the frontend before passing to the cache, and that may not work depending on the order which HAProxy executes. 90% of the time, it’s top-to-bottom in the config file, but there are enough exceptions that I cannot say with any certainty that this is even possible. If I were trying to make such a thing work, it would be with a lot of trial-and-error… but the browser should be able to tell HAProxy when it needs fresh data, and HAProxy respect that. I cannot think of a scenario where a browser would ask for fresh data and HAProxy should deny that request and send what is cached instead. Forcing such could have unintended consequences, such as having to reload HAProxy anytime something is updated behind it.