Haproxy sends random TCP RST's out frontend after hardware/haproxy upgrade

We recently replaced an old haproxy system running CentOS7 and haproxy 1.8.26 (I know… ancient) with new hardware, rhel9 and haproxy 2.4.22-f8e3218, which is what the repo is providing as the version to install.

Since the upgrade, we have seen our errors on some VIP’s increase exponentially. The errors are from a TCP RST being sent from the haproxy system to the client and breaking the connection. There is no evidence or log entries that the backend had any issue during this time and it has been verified in a tcpdump, the backend server doesn’t send a TCP RST.

We have poured over sysctl settings and haproxy settings trying to find what is causing this but have yet to discover the root cause.

A reload of haproxy causes the errors to decrease for a time and then as time continues, the errors become more frequent. The issue is prevalent on mysql VIPs as well as HTTP VIPs. I turned up the logging to info and still didn’t find any useful logs to tell me anything.

If anyone has thought around what it could be, I will go investigate whatever it is. At this point, I’m lost as to how to determine what the underlying cause is.

haproxy -vv
HAProxy version 2.4.22-f8e3218 2023/02/14 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2026.
Known bugs: http://www.haproxy.org/bugs/bugs-2.4.22.html
Running on: Linux 5.14.0-570.17.1.el9_6.x86_64 #1 SMP PREEMPT_DYNAMIC Fri May 23 22:47:01 UTC 2025 x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
OPTIONS = USE_PCRE2=1 USE_LINUX_TPROXY=1 USE_CRYPT_H=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
DEBUG =

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL +EPOLL -EVPORTS +FUTEX +GETADDRINFO -KQUEUE +LIBCRYPT +LINUX_SPLICE +LINUX_TPROXY +LUA -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OT -PCRE +PCRE2 -PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PRIVATE_CACHE -PROCCTL +PROMEX -PTHREAD_PSHARED -QUIC +RT +SLZ -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=20).
Built with OpenSSL version : OpenSSL 3.2.2 4 Jun 2024
Running on OpenSSL version : OpenSSL 3.2.2 4 Jun 2024
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.4.4
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.40 2022-04-14
PCRE2 library supports JIT : no (USE_PCRE2_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.5.0 20240719 (Red Hat 11.5.0-5)

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as cannot be specified using ‘proto’ keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
: mode=HTTP side=FE|BE mux=H1 flags=HTX
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
: mode=TCP side=FE|BE mux=PASS flags=

Available services : prometheus-exporter
Available filters :
[SPOE] spoe
[CACHE] cache
[FCGI] fcgi-app
[COMP] compression
[TRACE] trace

haproxy config snippets:

Ansible managed: This file is being managed via Ansible and should not be modified directly.

global
daemon
maxconn 2000000
user haproxy
group haproxy

log 127.0.0.1 local0 info

external-check
insecure-fork-wanted

nbproc 1
nbthread 16
cpu-map auto:1/1-16 4-19
    tune.ssl.default-dh-param 2048
    ssl-default-server-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+
3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS:!LOW:!MEDIUM:!EXP:!DES:!3DES

    stats socket /var/run/haproxy.sock mode 4775 group haproxy level admin


defaults

    log global

    timeout connect 3000ms
    timeout client 50000ms
    timeout http-request 5000ms
    timeout server 9000ms

    errorfile 408 /dev/null

    ### Select default load balancing algorithm
    balance leastconn

    option redispatch
    option tcpka

    fullconn 60000
    maxconn 2000000

    retries 1
frontend f_avance_other_secondary
    mode tcp
    option tcplog
    timeout client 180000ms
    log 127.0.0.1 local2 info
    bind 10.3.5.192:3306
    default_backend b_avance_other_secondary

backend b_avance_other_ssecondary
    mode tcp
    option mysql-check
    timeout server 180000ms
    source 10.3.5.196
    server pushdb01 10.3.69.247:3306 check
    server pushdb02 10.3.6.227:3306 check
    server pushdb03 10.3.6.228:3306 check

Merged pcap, frontend/backend, for a mysql test where we can recreate the issue in varying lengths of time.

btw, this is just a typo I made as I obfuscated the actual names

backend b_avance_other_ssecondary

A

A TCP RST in itself is not indicative of a problem or an error.

Often you can see a TCP RST because that is the most efficient way to close a socket, without accumulating useless state (like sockets in time wait status).

If it is connected to an actual issue, it is more likely a symptom than the root cause.

When you are talking about errors, do you mean an error counter on the haproxy stats interface, or what are you specifically referring to?

Are you experiencing an actual problem with the traffic and if so, what is that problem exactly? A log in http or tcplog format of a affected connection will be necessary.

2.4.22 has accumulated 293 bugs fixed as of today in the 2 and a half years it has been available, so it’s quite difficult to support it generally.

If you can, I would suggest switching your repro to Zenetys build, currently the latest 2.8 or 3.0 releases would be the best (conservative) choice:

Thank you for the response. I will specifically concentrate here on one VIP with errors/issues

It is the same params as the mysql VIP snippet I posted above, just using different IP’s. I created that VIP to test and be able to isolate tcpdumps to those IP’s to try and remove noise.

The traffic to the VIP experiences “Lost connection to MySQL server during query” but without any log messages indicating a backend server failure, etc.

I will work on gathering the tcplog that you suggested and we are discussing your upgrade suggestion.

A

We are currently upgrading to 3.0 and I will report back once that has completed.

1 Like

We upgraded to

root@ihaproxy05 1> haproxy -vv
HAProxy version 3.0.11-9e587df 2025/06/02 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2029.

and all of the application issues we were experiencing are no longer occurring. Ty for the recommendation and repo.

A