HAProxy bufferbloat/not buffering? when using in front of any caching proxy

I am trying to use HAProxy for TLS termination and some extra http rules in front of varnish cache. For some weird reason, whenever I have used HAProxy in front of any caching server, it does not buffer anything ( and I think that’s what the issue is here because nginx works just fine, but I cannot use it ), and there is no official resource that assures whether or not HAProxy can actually buffer response bodies or not in the first place.

Both servers are using bbr congestion algorithm and have high ulimit settings. I have also tested the servers with a high 100 segment initial tcp window all the way down to the default 10. Of course, I have also played around with the wmem rmem buffers on both servers. None of these settings changed anything.

The HAProxy and Varnish server are both running Fedora 39 and talk to each other via a local interface.

haproxy -vv
HAProxy version 2.9-dev11-2fb1776 2023/11/24 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 6.5.12-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Nov 20 22:44:24 UTC 2023 x86_64
Build options :
  TARGET  = generic
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_THREAD=1 USE_OPENSSL_WOLFSSL=1 USE_QUIC=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES -ACCEPT4 -BACKTRACE -CLOSEFROM -CPU_AFFINITY -CRYPT_H -DEVICEATLAS -DL -ENGINE -EPOLL -EVPORTS -GETADDRINFO -KQUEUE -LIBATOMIC -LIBCRYPT -LINUX_CAP -LINUX_SPLICE -LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING -NETFILTER -NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC +OPENSSL_WOLFSSL -OT -PCRE -PCRE2 -PCRE2_JIT -PCRE_JIT +POLL -PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION +QUIC -QUIC_OPENSSL_COMPAT -RT -SHM_OPEN +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD -TFO +THREAD -THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=1).
Built with OpenSSL version : wolfSSL 5.6.4
Running on OpenSSL version : wolfSSL 5.6.4
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built without PCRE or PCRE2 support (using libc's regex instead)
Encrypted password support via crypt(3): no
Built with gcc compiler version 13.2.1 20231011 (Red Hat 13.2.1-4)

Available polling systems :
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 2 (2 usable), will use poll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

I have tested this on multiple operating systems and different haproxy versions etc. with the same results, so I do not think this has to do with any specific haproxy version. I have tried messing with nodelay, request buffering, changing the maxrewrite and other buf values, but I was still unable to fix the issue. The only way I was able to make it act normal was to use maxconn 1 on the backend, but obviously that would never work in production.

Here is my problem. Note: all of those objects are warm in varnish cache.

Ideally, it would stream every object at once like this:

Or buffer everything like this:


(In all of the above photos, I am loading a page with 250 50kb images, from a 100mbps connection and 12ms latency)

My basic config. Of course, I have messed around with option http-no-delay, option http-buffer-request, option http-keep-alive, http-reuse always/safe.

global
    daemon
    maxconn 12000
defaults
    timeout server 30s
    timeout client 30s
    timeout connect 10s
    mode http
listen http
    mode http
    bind :443 ssl crt /root/mycert.pem alpn h2,http/1.1
    server s1 ip.ip.ip.ip:80

Can anyone guide me on how I can get HAProxy to send objects more consistently, or whether or not this is even possible to fix?

Thanks in advance!