Haproxy slow upload speed

so i ran into a really strange situation.

haproxy upload speed are insanely slow.

posting a 400MB video file to an php script takes around 1.3 minutes
this virtual machine is not receiving production traffic. isolated test setup. so CPU and RAM is all fine.

to make sure Haproxy was the issue and not network/backend server i installed nginx and configured it as a reverse proxy (with proxy_request_buffering off; ofcourse) then uploading the 400MB video file takes around 8 seconds… the network internaly and externaly is 2 gbit interface.

i tested it on haproxy 1.8 2.2 and 2.5 everywhere i have the same slow speeds.

haproxy:

  /         iface                   Rx                   Tx                Total
  ==============================================================================
             eth0:        4482.29 KB/s           22.86 KB/s         4505.15 KB/s
               lo:           1.50 KB/s            1.50 KB/s            3.00 KB/s
  ------------------------------------------------------------------------------
            total:        4483.79 KB/s           24.37 KB/s         4508.15 KB/s

and then nginx:

  |         iface                   Rx                   Tx                Total
  ==============================================================================
             eth0:       57731.82 KB/s          232.15 KB/s        57963.97 KB/s
               lo:           1.30 KB/s            1.30 KB/s            2.61 KB/s
  ------------------------------------------------------------------------------
            total:       57733.13 KB/s          233.45 KB/s        57966.58 KB/s

here is my config.

global
	log /dev/log	local0
	log /dev/log	local1 notice
	chroot /var/lib/haproxy
	stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
	stats timeout 30s
	user haproxy
	group haproxy
	daemon

	# Default SSL material locations
	ca-base /etc/ssl/certs
	crt-base /etc/ssl/private

	# Default ciphers to use on SSL-enabled listening sockets.
	# For more information, see ciphers(1SSL). This list is from:
	#  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
	# An alternative list with additional directives can be obtained from
	#  https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
	ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
	ssl-default-bind-options no-sslv3

	#CORS Module https://github.com/haproxytech/haproxy-lua-cors
	lua-load /etc/haproxy/cors.lua

defaults
	log	global
	mode	http
	option	dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
	errorfile 400 /etc/haproxy/errors/400.http
	errorfile 403 /etc/haproxy/errors/403.http
	errorfile 408 /etc/haproxy/errors/408.http
	errorfile 500 /etc/haproxy/errors/500.http
	errorfile 502 /etc/haproxy/errors/502.http
	errorfile 503 /etc/haproxy/errors/503.http
	errorfile 504 /etc/haproxy/errors/504.http

frontend http
    bind *:80
    mode http
    option  httplog
    default_backend servers-staging
    timeout client          1m
    redirect scheme https code 301 if !{ ssl_fc }

frontend https
    bind *:443 ssl crt /etc/haproxy/SSL/Bundles/ no-sslv3 alpn h2,http/1.1
    tcp-request content accept if { req_ssl_hello_type 1 }
    mode http
    option  httplog
    default_backend servers-staging
    timeout client          1m

    option forwardfor if-none

backend servers-staging
        option forwardfor if-none
        mode http
        timeout server 300s
        balance roundrobin
        option forwardfor if-none
        server localhost 10.0.0.233:8080

Please advice.

Nothing comes to mind.

Did you run the nginx test with HTTPS (and the same SSL ciphers) and in H2 mode? How does the CPU usage look like? Does haproxy behave any different when using the plaintext HTTP frontend - without the redirect of course.

@lukastribus well thats interesting.
an Haproxy upload over HTTP is done in 7 seconds.
an Haproxy upload over HTTPS is done in 46 seconds…

Did you test nginx with HTTPS?

Try HTTPS without H2 (remove h2 from the alpn configuration). Also you will have to find out exactly what SSL ciphers you are using for this benchmark.

i just tested SSL with h2 disabled. that that takes 7 seconds.
With h2 enabled it takes 1.3 minutes. so it seems the issue is indeed related to http2

@lukastribus i tested with and without h2 in both cases i see: TLS 1.3,X25519 and AES_256_GCM
as cipher

Ok, can you provide the full configuration, the output of haproxy -vv as well as the exact repro steps (are you uploading from a browser or from a benchmark tool, etc).

@lukastribus it can be a bit cumbersome to setup a repro. if you want i can just give you access to the virtual machine since its an isolated test setup anyway. cause that will save you some time. i have send you a message on slack in the HAProxy slack to work out the details if you would be up for that.

full config:

global
	log /dev/log	local0
	log /dev/log	local1 notice
	chroot /var/lib/haproxy
	stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
	stats timeout 30s
	user haproxy
	group haproxy
	daemon

	# Default SSL material locations
	ca-base /etc/ssl/certs
	crt-base /etc/ssl/private

	# Default ciphers to use on SSL-enabled listening sockets.
	# For more information, see ciphers(1SSL). This list is from:
	#  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
	# An alternative list with additional directives can be obtained from
	#  https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
	ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
	ssl-default-bind-options no-sslv3

	#CORS Module https://github.com/haproxytech/haproxy-lua-cors
	lua-load /etc/haproxy/cors.lua

#tune.h2.initial-window-size 10000000000000
# tune.h2.initial-window-size 1048576

defaults
	log	global
	mode	http
	option	dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
	errorfile 400 /etc/haproxy/errors/400.http
	errorfile 403 /etc/haproxy/errors/403.http
	errorfile 408 /etc/haproxy/errors/408.http
	errorfile 500 /etc/haproxy/errors/500.http
	errorfile 502 /etc/haproxy/errors/502.http
	errorfile 503 /etc/haproxy/errors/503.http
	errorfile 504 /etc/haproxy/errors/504.http

frontend http
    bind *:80
    mode http
    option  httplog
    default_backend servers-staging
    timeout client          1m
#    redirect scheme https code 301 if !{ ssl_fc }
    option forwardfor if-none

frontend https
    bind *:443 ssl crt /etc/haproxy/SSL/Bundles/ no-sslv3 alpn h2,http/1.1
#    bind *:443 ssl crt /etc/haproxy/SSL/Bundles/ no-sslv3 alpn http/1.1
    tcp-request content accept if { req_ssl_hello_type 1 }
    mode http
    option  httplog
    default_backend servers-staging
    timeout client          1m

    option forwardfor if-none

backend servers-staging
        option forwardfor if-none
        mode http
        timeout server 300s
        balance roundrobin
        option forwardfor if-none
        server localhost 10.0.0.233:8080

haproxy -vv:

HAProxy version 2.5.1-1~bpo11+1 2022/01/11 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2023.
Known bugs: http://www.haproxy.org/bugs/bugs-2.5.1.html
Running on: Linux 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   =

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=5).
Built with OpenSSL version : OpenSSL 1.1.1k  25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k  25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Support for malloc_trim() is enabled.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.36 2020-12-04
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 10.2.1 20210110

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTTP       side=FE|BE     mux=H2       flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
            fcgi : mode=HTTP       side=BE        mux=FCGI     flags=HTX|HOL_RISK|NO_UPG
       <default> : mode=HTTP       side=FE|BE     mux=H1       flags=HTX
              h1 : mode=HTTP       side=FE|BE     mux=H1       flags=HTX|NO_UPG
       <default> : mode=TCP        side=FE|BE     mux=PASS     flags=
            none : mode=TCP        side=FE|BE     mux=PASS     flags=NO_UPG

Available services : prometheus-exporter
Available filters :
	[SPOE] spoe
	[CACHE] cache
	[FCGI] fcgi-app
	[COMP] compression
	[TRACE] trace

the backend is simply a virtual machine with nginx+php-fpm with a simple php upload script.

i used this test video file to test the upload.
https://www.quintic.com/software/sample_videos/Cricket%20Bowling%20150fps%201200.avi

i tested with uploading from chrome firefox and curl and i was seeing the same issue at all 3 places.

Try raising:

tune.h2.initial-window-size
tune.h2.max-concurrent-streams
tune.h2.max-frame-size

@lukastribus pretty sure i found the issue.
https://haproxy.formilux.narkive.com/7PIDiRm6/http-2-initial-connection-window-is-too-small
so i added tune.h2.initial-window-size 2147483647 to my config. and the upload time went from 1.3 minutes to 11 seconds.

FYI @willy did respond to that thread 3 years ago:

https://www.mail-archive.com/haproxy@formilux.org/msg32040.html

@willy do you recall if this was ever improved? The default initial window size is definitely still the same.

Hi Lukas,

the issue by then was that the connection’s window size was not enlarged, so even increasing the h2-initial-window-size didn’t improve things. It was indeed addressed with these commits:
dc572364c (“BUG/MINOR: mux-h2: advertise a larger connection window size”)
97aaa6765 (“MINOR: mux-h2: only increase the connection window with the first update”)

However the per-stream window size remains moderate at 65535, as using too large a value can definitely cause some clients or server to completely stall if they always send the same stream first. Sadly, this issue is inherent to the protocol and is well known, it’s usually known as “window in window” or “tcp over tcp” problem: as streams are multiplexed on an in-order connection, you cannot increase windows too much by default, and making them too small will limit the bandwidth. This is why some examples of bandwidth calculations are provided in the doc :-/

@yctn please also retry with smaller values (e.g. 1 MB) to limit the inter-stream abuses. There’s one size that corresponds to your BDP (bandwidth-delay product = bandwidth multplied by ping time) which will be sufficient to fill the wire. In case some of your clients upload several files at once or are proxies that aggregate several clients over one connection, this will limit the risk that one stalls due to others uploading.

Ideally browsers ought to be improved to open a new HTTP/1 connection for large uploads instead of using HTTP/2 for these…

2 Likes

Updating this entry to mention that since 3.1 there’s now a dynamic window sizing that automatically adapts the number of buffers per stream to the number of active streams, and significantly speeds up transfers. Without changing any setting, the window increases from 64kB to 1.4MB for a single stream, multiplying the speed by 23. It’s possible to speed this up even more by setting tune.h2.fe.rxbuf in the global section.