posting a 400MB video file to an php script takes around 1.3 minutes
this virtual machine is not receiving production traffic. isolated test setup. so CPU and RAM is all fine.
to make sure Haproxy was the issue and not network/backend server i installed nginx and configured it as a reverse proxy (with proxy_request_buffering off; ofcourse) then uploading the 400MB video file takes around 8 seconds… the network internaly and externaly is 2 gbit interface.
i tested it on haproxy 1.8 2.2 and 2.5 everywhere i have the same slow speeds.
Did you run the nginx test with HTTPS (and the same SSL ciphers) and in H2 mode? How does the CPU usage look like? Does haproxy behave any different when using the plaintext HTTP frontend - without the redirect of course.
Try HTTPS without H2 (remove h2 from the alpn configuration). Also you will have to find out exactly what SSL ciphers you are using for this benchmark.
Ok, can you provide the full configuration, the output of haproxy -vv as well as the exact repro steps (are you uploading from a browser or from a benchmark tool, etc).
@lukastribus it can be a bit cumbersome to setup a repro. if you want i can just give you access to the virtual machine since its an isolated test setup anyway. cause that will save you some time. i have send you a message on slack in the HAProxy slack to work out the details if you would be up for that.
full config:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
# An alternative list with additional directives can be obtained from
# https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
#CORS Module https://github.com/haproxytech/haproxy-lua-cors
lua-load /etc/haproxy/cors.lua
#tune.h2.initial-window-size 10000000000000
# tune.h2.initial-window-size 1048576
defaults
log global
mode http
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend http
bind *:80
mode http
option httplog
default_backend servers-staging
timeout client 1m
# redirect scheme https code 301 if !{ ssl_fc }
option forwardfor if-none
frontend https
bind *:443 ssl crt /etc/haproxy/SSL/Bundles/ no-sslv3 alpn h2,http/1.1
# bind *:443 ssl crt /etc/haproxy/SSL/Bundles/ no-sslv3 alpn http/1.1
tcp-request content accept if { req_ssl_hello_type 1 }
mode http
option httplog
default_backend servers-staging
timeout client 1m
option forwardfor if-none
backend servers-staging
option forwardfor if-none
mode http
timeout server 300s
balance roundrobin
option forwardfor if-none
server localhost 10.0.0.233:8080
haproxy -vv:
HAProxy version 2.5.1-1~bpo11+1 2022/01/11 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2023.
Known bugs: http://www.haproxy.org/bugs/bugs-2.5.1.html
Running on: Linux 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O2 -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
DEBUG =
Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_THREADS=64, default=5).
Built with OpenSSL version : OpenSSL 1.1.1k 25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Support for malloc_trim() is enabled.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.36 2020-12-04
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 10.2.1 20210110
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : prometheus-exporter
Available filters :
[SPOE] spoe
[CACHE] cache
[FCGI] fcgi-app
[COMP] compression
[TRACE] trace
the backend is simply a virtual machine with nginx+php-fpm with a simple php upload script.
the issue by then was that the connection’s window size was not enlarged, so even increasing the h2-initial-window-size didn’t improve things. It was indeed addressed with these commits:
dc572364c (“BUG/MINOR: mux-h2: advertise a larger connection window size”)
97aaa6765 (“MINOR: mux-h2: only increase the connection window with the first update”)
However the per-stream window size remains moderate at 65535, as using too large a value can definitely cause some clients or server to completely stall if they always send the same stream first. Sadly, this issue is inherent to the protocol and is well known, it’s usually known as “window in window” or “tcp over tcp” problem: as streams are multiplexed on an in-order connection, you cannot increase windows too much by default, and making them too small will limit the bandwidth. This is why some examples of bandwidth calculations are provided in the doc :-/
@yctn please also retry with smaller values (e.g. 1 MB) to limit the inter-stream abuses. There’s one size that corresponds to your BDP (bandwidth-delay product = bandwidth multplied by ping time) which will be sufficient to fill the wire. In case some of your clients upload several files at once or are proxies that aggregate several clients over one connection, this will limit the risk that one stalls due to others uploading.
Ideally browsers ought to be improved to open a new HTTP/1 connection for large uploads instead of using HTTP/2 for these…
Updating this entry to mention that since 3.1 there’s now a dynamic window sizing that automatically adapts the number of buffers per stream to the number of active streams, and significantly speeds up transfers. Without changing any setting, the window increases from 64kB to 1.4MB for a single stream, multiplying the speed by 23. It’s possible to speed this up even more by setting tune.h2.fe.rxbuf in the global section.