Haproxy suddenly stops forwarding

Out of a suddon (suddenly) haproxy does not forward any more to the backend.
After a

systemctl restart haproxy.service

it works again like it should. I can not tell how often this happens. But this time it was the second time within two weeks!

  • What would you check on the system when this happens next time?
  • Any guesses what the cause could be?
  • Any tipps how to work around this?

Thanks for some hints.

Below the relevant config snippet (It is forwarding to a postgres db / There are other rules along this one listening on port 443 with ssl termination):

global
log /dev/log local0
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon

defaults
mode http
log global
option httpslog
timeout connect 240s
timeout client 1200s
timeout server 1200s

frontend my-fe
bind *:5432
mode tcp
option tcplog
use_backend my-backend

backend my-backend
mode tcp
server my-servername myip:myport

Check CPU load, specifically whether haproxy consumes large amounts of it (or at least one full CPU core). This would indicate a spinning process (which is a bug in haproxy).

CPU usage in the hung condition is likely either very low or very high.

Also check the states of sockets. Does the socket count keep increasing with a lots of old orphaned socket still active?

Run “show info” and “show fd” on the admin socket. The latter produces at lot of outputs.

Absolutely provide the full configuration, not just what you think is relevant. Replace confidential config statements with placeholders, but don’t remove config statements.

And most importantly provide the output of haproxy -vv.

Below you find the output of haproxy -vv and also the full config.
Remarks: There is no traffic from real users cause I’m bulding a new system. Only from bots and hackers.
Thank you very much.

~# haproxy -vv
HAProxy version 2.6.12-1ppa1~jammy 2023/04/01 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2027.
Known bugs: http://www.haproxy.org/bugs/bugs-2.6.12.html
Running on: Linux 5.15.0-72-generic #79-Ubuntu SMP Wed Apr 19 08:22:18 UTC 2023 x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O2 -g -O2 -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment
OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_OT=1 USE_PROMEX=1
DEBUG = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE +LIBCRYPT +LINUX_SPLICE +LINUX_TPROXY +LUA -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL +OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX -QUIC +RT +SLZ -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
Running on OpenSSL version : OpenSSL 3.0.2 15 Mar 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with Lua version : Lua 5.3.6
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with OpenTracing support.
Support for malloc_trim() is enabled.
Built with libslz for stateless compression.
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.39 2021-10-29
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.3.0

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as cannot be specified using ‘proto’ keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
: mode=HTTP side=FE|BE mux=H1 flags=HTX
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
: mode=TCP side=FE|BE mux=PASS flags=

Available services : prometheus-exporter
Available filters :
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[ OT] opentracing
[SPOE] spoe
[TRACE] trace

~# cat /etc/haproxy/haproxy.cfg
global
log /dev/log local0
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon

defaults
mode http
log global
option httpslog
timeout connect 240s
timeout client 1200s
timeout server 1200s

frontend fe-443
bind *:443 ssl crt /etc/haproxy/certs/
acl my-acl hdr(host) -i -m beg example.org
use_backend my-backend-1 if my-acl
capture request header Host len 50
capture request header user-agent len 150

frontend fe-5432
bind *:5432
mode tcp
option tcplog
use_backend my-backend-2

backend my-backend-1
mode http
server my-server-1 any-ip:any-port

backend my-backend-2
mode tcp
server my-server-2 any-ip:any-port

Your not specifying maxconn, so haproxy picks its own value.

You also have huge timeouts (20 minutes), so it could be that connections pile up until all maxconn slots are filled.

Thanks. For now I have set the global one to 500 and the per server to 20. I’m curious to see how that will work out.