Theoretical limits for a HAProxy instance

iago.alonso · November 25, 2022, 4:27pm

Hello,

We are performing a lot of load tests, and we hit what we think is an artificial limit of some sort, or a parameter that we are not taking into account (HAProxy config setting, kernel parameter…). We are wondering if there’s a known limit on what HAProxy is able to process, or if someone has experienced something similar, as we are thinking about moving to bigger servers, and we don’t know if we will observe a big difference.

Custom kernel parameters

net.ipv4.ip_local_port_range = "12768    60999"
net.nf_conntrack_max = 5000000
fs.nr_open = 5000000

Output from `haproxy -vv`

HAProxy version 2.6.6-274d1a4 2022/09/22 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2027.
Known bugs: http://www.haproxy.org/bugs/bugs-2.6.6.html
Running on: Linux 5.15.0-53-generic #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
  OPTIONS = USE_PCRE=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -ENGINE +GETADDRINFO +OPENSSL -LUA +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=32).
Built with OpenSSL version : OpenSSL 3.0.7 1 Nov 2022
Running on OpenSSL version : OpenSSL 3.0.7 1 Nov 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with the Prometheus exporter as a service
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.3.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
	[CACHE] cache
	[COMP] compression
	[FCGI] fcgi-app
	[SPOE] spoe
	[TRACE] trace

HAProxy config

global
    log /dev/log len 65535 local0 warning
    chroot /var/lib/haproxy
    stats socket /run/haproxy-admin.sock mode 660 level admin
    user haproxy
    group haproxy
    daemon
    maxconn 2000000
    maxconnrate 2500
    maxsslrate 2500

defaults
    log     global
    option  dontlognull
    timeout connect 10s
    timeout client  120s
    timeout server  120s

frontend stats
    mode http
    bind *:8404
    http-request use-service prometheus-exporter if { path /metrics }
    stats enable
    stats uri /stats
    stats refresh 10s

frontend k8s-api
    bind *:6443
    mode tcp
    option tcplog
    timeout client 300s
    default_backend k8s-api

backend k8s-api
    mode tcp
    option tcp-check
    timeout server 300s
    balance leastconn
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 500 maxqueue 256 weight 100
    server master01 x.x.x.x:6443 check
    server master02 x.x.x.x:6443 check
    server master03 x.x.x.x:6443 check
    retries 0

frontend k8s-server
    bind *:80
    mode http
    http-request add-header X-Forwarded-Proto http
    http-request add-header X-Forwarded-Port 80
    default_backend k8s-server

backend k8s-server
    mode http
    balance leastconn
    option forwardfor
    default-server inter 10s downinter 5s rise 2 fall 2 check
    server worker01a x.x.x.x:31551 maxconn 200000
    server worker02a x.x.x.x:31551 maxconn 200000
    server worker03a x.x.x.x:31551 maxconn 200000
    server worker04a x.x.x.x:31551 maxconn 200000
    server worker05a x.x.x.x:31551 maxconn 200000
    server worker06a x.x.x.x:31551 maxconn 200000
    server worker07a x.x.x.x:31551 maxconn 200000
    server worker08a x.x.x.x:31551 maxconn 200000
    server worker09a x.x.x.x:31551 maxconn 200000
    server worker10a x.x.x.x:31551 maxconn 200000
    server worker11a x.x.x.x:31551 maxconn 200000
    server worker12a x.x.x.x:31551 maxconn 200000
    server worker13a x.x.x.x:31551 maxconn 200000
    server worker14a x.x.x.x:31551 maxconn 200000
    server worker15a x.x.x.x:31551 maxconn 200000
    server worker16a x.x.x.x:31551 maxconn 200000
    server worker17a x.x.x.x:31551 maxconn 200000
    server worker18a x.x.x.x:31551 maxconn 200000
    server worker19a x.x.x.x:31551 maxconn 200000
    server worker20a x.x.x.x:31551 maxconn 200000
    server worker01an x.x.x.x:31551 maxconn 200000
    server worker02an x.x.x.x:31551 maxconn 200000
    server worker03an x.x.x.x:31551 maxconn 200000
    retries 0

frontend k8s-server-https
    bind *:443 ssl crt /etc/haproxy/certs/
    mode http
    http-request add-header X-Forwarded-Proto https
    http-request add-header X-Forwarded-Port 443
    http-request del-header X-SERVER-SNI
    http-request set-header X-SERVER-SNI %[ssl_fc_sni] if { ssl_fc_sni -m found }
    http-request set-var(txn.fc_sni) hdr(X-SERVER-SNI) if { hdr(X-SERVER-SNI) -m found }
    http-request del-header X-SERVER-SNI
    default_backend k8s-server-https

backend k8s-server-https
    mode http
    balance leastconn
    option forwardfor
    default-server inter 10s downinter 5s rise 2 fall 2  check no-check-ssl
    server worker01a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker02a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker03a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker04a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker05a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker06a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker07a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker08a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker09a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker10a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker11a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker12a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker13a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker14a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker15a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker16a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker17a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker18a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker19a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker20a x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker01an x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker02an x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    server worker03an x.x.x.x:31445 ssl ca-file /etc/haproxy/ca/ca.crt sni var(txn.fc_sni) maxconn 200000
    retries 0

frontend k8s-nfs-monitor
    bind *:8080
    mode http
    monitor-uri /health_nfs_cluster
    acl k8s_server_down nbsrv(k8s-server) le 2
    acl nfs_down nbsrv(nfs) lt 1
    monitor fail if nfs_down || k8s_server_down

backend nfs
    mode tcp
    default-server inter 5s downinter 2s rise 1 fall 2
        server nfs01 x.x.x.x:2049 check

frontend k8s-cluster-monitor
    bind *:8081
    mode http
    monitor-uri /health_cluster
    acl k8s_server_down nbsrv(k8s-server) le 2
    monitor fail if k8s_server_down

When trying to perform the load test in production, we observe that we can sustain 200k connections, and 10k rps, with a load1 of about 10. The maxsslrate and maxsslconn are maxed out, but we handle the requests fine, and we don’t return 5xx. Once we increase the load just a bit and hit 11k rps and about 205k connections, we start to return 5xx and we rapidly decrease the load, as these are tests against production.

Production server specs

Hosted on bare metal

CPU: AMD Ryzen 7 3700X 8-Core Processor (16 threads)
RAM: DDR4 64GB (2666 MT/s)

Production Prometheus metrics

haproxy_process_current_connections

rate(haproxy_process_requests_total[2m])

haproxy_process_idle_time_percent

haproxy_process_current_ssl_rate

haproxy_process_current_connection_rate

node_load1

node_nf_conntrack_entries

When trying to perform a load test with synthetic tests using k6 as our load generator against staging, we are able to sustain 750k connections, with 20k rps. The load generator has a ramp-up time of 120s to achieve the 750k connections, as that’s what we are trying to benchmark.

Staging server specs

Hosted on bare metal

CPU: AMD Ryzen 5 3600 6-Core Processor (12 threads)
RAM: DDR4 64GB (3200 MT/s)

Staging Prometheus metrics

haproxy_process_current_connections

rate(haproxy_process_requests_total[2m])

haproxy_process_idle_time_percent

haproxy_process_current_ssl_rate

haproxy_process_current_connection_rate

node_load1

node_nf_conntrack_entries

mirth · December 9, 2022, 5:17pm

~3 years ago we ran into some hard limits on connections coming into a frontend in haproxy v1.8. we worked around it by setting up multiple identical frontends in the config running on different ports, and then put a load balancer in front of the haproxy that spread the traffic across the ports.

i don’t recall the exact underlying technical limitation that was at the root cause of this and we haven’t been doing it with our current config, so this may or may not be useful information

iago.alonso · January 24, 2023, 3:26pm

Seems like OpenSSL was the culprit here. We downgraded to 1.1.1s (from 3.0.7), and we now see much better numbers. HAProxy seems to scale linearly with the available resources.

Topic		Replies	Views
Vertical scaling of HAProxy instances Help!	2	524	January 16, 2023
Haproxy session rate slower than single web server Help!	8	2429	January 5, 2018
HAProxy - Can't get more than 48,000 queries per second Help!	6	3157	February 24, 2019
Large Conntrack/Active connections count. FD Limit Reached Help!	7	1915	March 11, 2021
Understanding maxconn and maxonnrate and delays Help!	12	7049	February 4, 2019

Theoretical limits for a HAProxy instance

Related topics