I need an advice regarding multithreading configuration.
We are using haproxy 2.0.12 on a CentOS8 virtual machine (VMware) with 16 GB of RAM, 4 vCPUs (1 core each) and 1GBPs NIC. This setup has only one frontend (http mode, SSL only) and two backends (http). Currently, the traffic is quite small: we saw maximum up to 400 concurrent connections, with maximal request rate of 42 connections/sec and maximal consumed network bandwidth of 10 MBit/s. But in future we need to handle up to ~5000 concurrent connections, maybe 10000.
And here begins the problem: with the current setup, haproxy consumes up to 35% of CPU power when nbproc 1 and nbthread 4 are set. As soon as I comment out nbthread line and switch to nbproc 4, the CPU load disappears completely: maximum 2% of all 4 CPU’s power is used by haproxy.
I would leave it “as is” with nbproc, but it causes some problems with “independent” stick tables, dedicated stats page for each process etc… So I definitely need to use multithreading.
Here is our config:
global
maxconn 10000
stats socket /var/run/haproxy.stat mode 600 level admin
log 127.0.0.1:514 local2
chroot /var/empty
pidfile /var/run/haproxy.pid
user haproxy
group haproxy
ssl-default-bind-options no-tlsv13
ssl-default-bind-ciphers 'HIGH:!aNULL:!MD5'
tune.ssl.default-dh-param 4096
tune.ssl.cachesize 1000000
tune.ssl.lifetime 600
tune.ssl.maxrecord 1460
nbproc 1
nbthread 4
daemon
defaults
option contstats
retries 3
frontend WEB
bind 192.168.0.25:80
bind 192.168.0.25:443 ssl crt /Certs/domain1.pem crt /Certs/domain2.pem
mode http
timeout http-request 5s
timeout client 30s
log global
option httplog
option dontlognull
option forwardfor
monitor-uri /healthcheck
maxconn 8000
timeout client 30s
http-request capture req.hdr(Host) len 20
%%%Some ACLs are defined here%%%
http-response set-header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload"
http-response set-header X-Frame-Options "SAMEORIGIN"
http-response set-header X-XSS-Protection "1; mode=block"
http-response set-header X-Content-Type-Options "nosniff"
http-response set-header X-Permitted-Cross-Domain-Policies "none"
http-response set-header X-Robots-Tag "all"
http-response set-header X-Download-Options "noopen"
# Do not allow more than 10 concurrent tcp connections per IP, or 15 connections in 3 seconds
tcp-request content reject if { src_conn_rate(Abuse) ge 15 }
tcp-request content reject if { src_conn_cur(Abuse) ge 10 }
tcp-request connection track-sc1 src table Abuse
# Redirect HTTP to HTTPS
redirect scheme https code 301 if !{ ssl_fc }
default_backend Web-Pool
backend Web-Pool
mode http
balance roundrobin
retries 2
option redispatch
timeout connect 5s
timeout server 30s
timeout queue 30s
option forwardfor
option httpchk HEAD /
http-check expect status 200
cookie DYNSRV insert indirect nocache
fullconn 4000
http-request set-header X-Client-IP %[src]
server httpd01 192.168.0.30:80 check weight 1 inter 2000 rise 2 fall 2 minconn 0 maxconn 0 on-marked-down shutdown-sessions
server httpd02 192.168.0.31:80 check weight 2 inter 2000 rise 2 fall 2 minconn 0 maxconn 0 on-marked-down shutdown-sessions
backend Abuse
stick-table type ip size 1m expire 30m store conn_rate(3s),conn_cur,gpc0,http_req_rate(10s),http_err_rate(20s)
With multi-process config, I use the following settings:
nbproc 4
cpu-map 1 0
cpu-map 2 1
cpu-map 3 2
cpu-map 4 3
I believe something is just wrong in my configuration… Could anybody help me to find the cause of this problem?
Make sure those 4 vCPUs are cores dedicated to this VM (and not preempted for other things), and that they are on the same NUMA node.
Problems that arise from preempting will definitely be worse with multithreading, when compared to multiple processes.
Try binding the threads to the respective cores in nbthread mode: cpu-map auto:1/1-4 0-3
Also provide the output of haproxy -vv please.
I’m not sure you actually need parallel processing - unless you get DDoSed with a SSL handshake attack. Consider just using nbproc 1, nbthread 1 as a workaround. Of course this does not scale.
Basically I do. We have some apache servers that are often suffering from different kinds of DDoS attacks, and the idea is to put everything behind the haproxy.
Here is the output of haproxy -vv:
HA-Proxy version 2.0.12 2019/12/21 - https://haproxy.org/
Build options :
TARGET = linux-glibc
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wno-implicit-fallthrough -Wno-stringop-overflow -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
OPTIONS = USE_PCRE=1 USE_PCRE_JIT=1 USE_THREAD=1 USE_REGPARM=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_ZLIB=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1
Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE +PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_THREADS=64, default=4).
Built with OpenSSL version : OpenSSL 1.1.1 FIPS 11 Sep 2018
Running on OpenSSL version : OpenSSL 1.1.1 FIPS 11 Sep 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 8.42 2018-03-20
Running on PCRE version : 8.42 2018-03-20
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTX side=FE|BE mux=H2
h2 : mode=HTTP side=FE mux=H2
<default> : mode=HTX side=FE|BE mux=H1
<default> : mode=TCP|HTTP side=FE|BE mux=PASS
Available services : none
Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace
Unfortunately the cpumap setting did not help. CPU usage is still very high with quite low traffic (connection rate max: 640/s, session rate max: 520/s, request rate max: 330/s, maximal bandwidth: 15 MBit/s):
And the problem still disappears immediately when switching to multi-processing mode instead of multi-threading.
Sorry, but I do not understand.
First of all, haproxy shows pretty good performance in multi-process mode. Only multithreading mode causes performance problems.
Second, what is the difference between haproxy and other reverse proxies that can work on VMware without dedicated CPUs?
I still have impression that something is wrong with my configuration…
Is this still an “issue” in newer version of Haproxy?
Is it possible to have “classic” monitoring setup with prometheus node exporter and scrape all statistics at once?
We are using nbthreads but for some use cases, using multiple cpu core would greatly improve performance.
Multiprocess mode will always require separate stats sockets, etc. This is not going away. In fact this is the main pain point that multi-threading addresses.