Eventual lockup after reload-socket

We have some Python websockets, apache2 hosts, and Percona XtraDB Cluster (mysql) hosts behind our proxy. ( HA-Proxy version 2.0.20-1~bpo10+1 2021/01/12 )

The websockets require client certificate authentication. The certificate authority cert and revocation list are kept on the proxy.

Those 2 files are refreshed periodically, and then we run /etc/init.d/haproxy reload-socket. This allows us to load a new configuration without doing a full stop/start and disconnecting users.

However, after a number of these config reloads, the proxy will stop passing traffic until it is rebooted. This freeze condition takes less than 48 hours to recur. Additionally, the mysql hosts also have to reboot before they are able to serve requests as well. It is all pretty bizarre.

Without using reload-socket, things are quite stable for many days. I am wondering if anyone has experienced anything like this, or has any suggestion.

Here is my config:

global
   maxconn 50000
   user haproxy
   group haproxy
   daemon
   ssl-server-verify none
   tune.ssl.default-dh-param 2048
   stats socket /run/haproxy/admin.sock mode 777 level admin expose-fd listeners


defaults
   log   global
   mode   http
   option http-server-close   
   option tcplog
   option redispatch
   retries   3
   option redispatch
   maxconn 9999
   log        127.0.0.1       local0
   log        127.0.0.1       local7 debug
   option httpchk
   timeout connect 5s
   timeout queue 5s
   timeout client 36000s
   timeout server 36000s
   timeout tunnel 1h



listen stats
    /etc/haproxy/crl.pem
    bind :1936 ssl crt /etc/haproxy/certs.d/
    mode http
    option httplog
    stats enable
    stats realm Haproxy\ Statistics
    stats uri /
    option forwardfor



frontend HTTPS-FRONTEND 
    bind *:80
    bind *:443 ssl crt /etc/haproxy/certs.d/
    mode http
    maxconn 9999
    option httplog
    option forwardfor
    tcp-request inspect-delay 5s
    tcp-request content accept if { req_ssl_hello_type 1 }
    redirect scheme https code 301 if !{ ssl_fc }
    use_backend HTTPS-BACKEND
 
backend HTTPS-BACKEND
   balance source
   mode http
   cookie serverid insert
   option ssl-hello-chk
   default-server check maxconn 1000
   option httpchk HEAD / HTTP/1.1\r\nHost:localhost
   http-request set-header X-Forwarded-Port %[dst_port]
   http-request add-header X-Forwarded-Proto https if { ssl_fc }
   server web001 web001.dfs.c3.domain.com:443 ssl sni ssl_fc_sni check fall 3 rise 2
   server web002 web002.dfs.c3.domain.com:443 ssl sni ssl_fc_sni check fall 3 rise 2
   server web003 web003.dfs.c3.domain.com:443 ssl sni ssl_fc_sni check fall 3 rise 2




frontend WSS-9876-FRONTEND
    bind *:9876 ssl crt /etc/haproxy/certs.d/ ca-file /etc/haproxy/ca.pem verify required crl-file /etc/haproxy/crl.pem
    mode http
    maxconn 9999
    option httplog
    option forwardfor
    tcp-request content accept if { req_ssl_hello_type 1 }
    http-request set-header X-Forwarded-Proto https if { ssl_fc }
    http-request set-header X-Forwarded-Ssl on if { ssl_fc }
    use_backend WSS-9876-BACKEND

backend WSS-9876-BACKEND
   balance source
   mode http
   option tcp-check
   cookie serverid insert
   http-request set-header X-Forwarded-Port %[dst_port]
   http-request add-header X-Forwarded-Proto https if { ssl_fc }
   default-server check maxconn 1000

   server comm001 comm001.dfs.c3.domain.com:9876 ssl check fall 3 rise 2
   server comm002 comm002.dfs.c3.domain.com:9876 ssl check fall 3 rise 2
   server comm003 comm003.dfs.c3.domain.com:9876 ssl check fall 3 rise 2




frontend WSS-9877-FRONTEND
    bind *:9877 ssl crt /etc/haproxy/certs.d/
    mode http
    maxconn 9999
    option httplog
    option forwardfor
    tcp-request content accept if { req_ssl_hello_type 1 }
    http-request set-header X-Forwarded-Proto https if { ssl_fc }
    http-request set-header X-Forwarded-Ssl on if { ssl_fc }
    use_backend WSS-9877-BACKEND

backend WSS-9877-BACKEND
   balance source
   mode http
   option tcp-check
   cookie serverid insert
   http-request set-header X-Forwarded-Port %[dst_port]
   http-request add-header X-Forwarded-Proto https if { ssl_fc }
   default-server check maxconn 1000

   server comm001 comm001.dfs.c3.domain.com:9877 ssl check fall 3 rise 2
   server comm002 comm002.dfs.c3.domain.com:9877 ssl check fall 3 rise 2
   server comm003 comm003.dfs.c3.domain.com:9877 ssl check fall 3 rise 2




frontend MYSQL-FRONTEND
    bind *:3306
    mode tcp
    maxconn 9999
    option tcplog
    tcp-request content accept if { req_ssl_hello_type 1 }
    use_backend MYSQL-BACKEND



backend MYSQL-BACKEND
   balance leastconn
   mode tcp
   option tcp-check
   default-server check maxconn 1000
   server db001 db001.dfs.c3.domain.com:3306 check fall 3 rise 2
   server db002 db002.dfs.c3.domain.com:3306 check fall 3 rise 2
   server db003 db003.dfs.c3.domain.com:3306 check fall 3 rise 2

As an update to this, I am noticing that after reload-socket, numerous haproxy instances pile up and they are not terminated. In fact, even after stopping the service, these extra instances are not killed. I have to run pkill -9 haproxy.

You have huge timeouts. 10 hours client and server timeout, and 1hour. You are also saying that you are using websockets and I can also see mysql in there, so those sessions are probably long running.

Reloading haproxy means that haproxy will not kill active sessions on the old process. Instead the wait for a timeout or a connection close.

But if the timeouts are big or huge and/or the connections are not idling for longer than the timeout, then those sessions will never be closed and old haproxy processes will keep piling up as you reload.

How often are you reloading?

I suggest you bring down the timeouts and configure hard-stop-after (will kill sessions after X amount of time, permitting the old haproxy process to exit even when the session is still busy):

https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#3.1-hard-stop-after

thank you, this is helpful