Help ! with piling backup sessions at certain interval

backend sessions increasing and bringing down the loadbalancer down. Any help to check what are the backend sessions causing haproxy to go down ?

due to the backend session daemon is also going down , eventually I had to reboot the LB server to bring it up

Jul 16 21:27:44 systemd[1]: Started HAProxy Load Balancer.

Jul 16 21:27:44 haproxy-systemd-wrapper[7922]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

Jul 16 21:27:44 haproxy-systemd-wrapper[7922]: [ALERT] 197/212744 (7923) : [/usr/sbin/haproxy.main()] Cannot create pidfile /run/haproxy.pid

Jul 16 21:27:44 haproxy-systemd-wrapper[7922]: haproxy-systemd-wrapper: exit, haproxy RC=1

Jul 16 21:27:44 systemd[1]: haproxy.service: main process exited, code=exited, status=1/FAILURE

Jul 16 21:27:44 systemd[1]: Unit haproxy.service entered failed state.

Jul 16 21:27:44 systemd[1]: haproxy.service failed.

Share the output of haproxy -vv and the configuration. Are you saying connects are not properly closed? To they pile up in netstat output as well? If yes, in which state are they in in the netstat output and is it a backend or a frontend connection?

HA-Proxy version 1.5.18 2016/05/10
Copyright 2000-2016 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -DTCP_USER_TIMEOUT=18
OPTIONS = USE_LINUX_TPROXY=1 USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

how to check if connects are closing properly ? also the command for netstat o/p?

when try to reach my endpoint on the k8s cluster its bringing down the Load balancer, looks like some user making concurrent connections and how to find which user is it ? Any idea on configuring the log to see who is hitting the loadbalancer ?

ab -t 60 -k -c 2000 -n 20000 https://xxxx/xxx/xxx/

Share the configuration.

netstat command is netstat -nt

Netstat output is still missing.

That said, you configured both client and server timeout with 120 minutes. That is asking for trouble and will certainly make you extremely vulnerable to state exhaustion. Use something between 60 seconds and 5 minutes, not 2 hours.

Those are only about 220 TCP sessions with not really useful informations from that.

Other than the way too large timeouts I can also see that maxconn is miss-configured. Global maxconn is only 5000 while you have maxconn 6000 in the default section. Global maxconn needs to be larger than the total amount of frontend maxconn. You have at least 8 frontends so if you need maxconn 6000 per frontend, global maxconn should be at least 50000, otherwise when you reach maxconn on just one single frontend, nothing else will work, including the stats socket.

So to summary my suggestion is:

  • reduce timeouts to minutes instead of hours
  • increase global maxconn or decrease default maxconn
  • after that, if the issue still happens, check and post the output of the stats page

Great help so far, actually I have 10000+ TCP sessions in the output , this website is restricting to paste more than 32000 words , So I have trimed the output.

could you please answer my below queries

  1. Can you plz explain a little more on 8 frontends concept? I understand that the global maxconn to be larger than total amount of frontend conn, how do I know how many front ends I have, you said 8 … Is it just an example or how to check. Also we have only one load balancer server.

  2. I want to configure the logs properly, My basic requirement is to see who is hitting the load balancer (haproxy) at what time ? who is the cause for increasing the backend sessions ? Any suggestions

today LB crashed again. with these many connections 4646 , I am pasting the op in this link as I unable to paste entire o/p here.

netstat op

Well it looks like connection from “worker2” are piling up in FIN_WAIT2 state.

Do you have another load-balancer in front of it and all requests come from the same IP? worker2 doesn’t seem like a backend server, because it has high and random source ports connecting to your port 443 as far as I can tell from the netstat output. Can you clarify what IP worker2 is?

Sure, I also forget one important point: maxconn needs to be double that number, because the connection on the backend also has to be considered.

So, global maxconn is the maximum number of connections that one haproxy process handles, it should never be reached, because it will mean that the entire process does no longer handle any request, not even for example the stats page.

To avoid this, maxconn will also be configured per frontend, and it also has an impact on the backend (because 1 frontend connection usually means 1x backend connections, we have to double the number I talked about earlier).

Now if we put a maxconn configuration into the default section like in your configuration, this means that ever frontend will inherit the value from the default section.

So here is an example:

global
 maxconn 100

defaults
 maxconn 10

frontend a
frontend b
frontend c
frontend d
frontend e

Which is short for:

global
 maxconn 100

frontend a
 maxconn 10
frontend b
 maxconn 10
frontend c
 maxconn 10
frontend d
 maxconn 10
frontend e
 maxconn 10

In this case we have a global (process) value of 100, and each of the five frontends has maxconn 10.

So we have 5 x 10 = 50 total for all the frontends, plus we need to double this value to account for the connections on the backend, so we are at 100, which matches exactly the global configuration. This would probably work, because it’s unlikely that all frontends are 100% at maxconn, but we should give global maxconn some room. So in this case for example we would maybe bump maxconn to 110 or something.

Yes you can syslog every request if you want, see more about logging configuration here, however I believe that in this case you probably won’t see anything particolarly special in the haproxy logs, because I think we are not properly timing out our TCP sessions and this is what in the end causes those issues.

I see 2 problems here:

  • connections from worker2 are piling up in in FIN_WAIT2 state, probably in the proxy-https frontend
  • global maxconn is reached before frontend maxconn is reached, this will cause haproxy to stop accepting new connections. If global maxconn would not be reached, but frontend maxconn, you would only see that specific frontend no longer accepting new connection, which means other frontends would still work, as would the stats interface of haproxy, where you could extract usefull informations about the number of active session in the specific frontend. When global maxconn is reach though, not even the stats interface would work.

So I’m suggesting multiple things:

  • make sure global maxconn is configured considering the points explained above
  • you can keep client/server timeouts high, but I strongly suggest you specifically configure client-fin and server-fin timeouts to low values such as 60 seconds (put into the default section timeout client-fin 60s and timeout server-fin 60s). What this means is then is that connection should not be in FIN_WAIT states for hours, but only a few minutes and I believe this is the root cause of your issue.
  • also double check the sysctl net.ipv4.tcp_fin_timeout is at it’s default of 60 seconds (not some very large value)
2 Likes

I cannot ask for better explanation than this, even my company support team has not provided this much info. wow amazing, My respects to you!!!

well I found that user API’s on the clusters are accessing through the POST connection.
eg: we have two networks “public network” to " k8s cluster network" , where worker 2 is in k8s cluster network , all requests has to pass from single load balancer (Haproxy) … I see the backend https-proxy is filling up with current sessions in monitoring tool and https-proxy is bringing down the entire cluster, As you rightly directed us that timeout’s are high 120m and user code must be causing FIN_WAIT2 to pile up.

One last set of question,

  1. when pod( or user workloads ) is running on worker-2 all the FIN_WAIT2’s should happen on worker-2 right. but why it is causing on load balancer ? I am missing some logic.

  2. are there any limits.conf to update for ulimit’s ? (If I set my global connections to 10000 and default connections to 4000. )

It depends who closes first. If for example on the frontend the client in the public network closes the connection, haproxy will try to close the corresponding connection on the backend. However if the backend server never actively closes the connection, it will remain in FIN_WAIT2 for some time. Reducing that time is key here.

And don’t really understand the netstat output you send though. If worker2 is a remote server, it should have the same port number0s (443). However in the PDF you send, worker2 ports are high/random and local ports is probably https (it’s a little unclear in the PDF), meaning worker2 would be a client in this case (frontend), not a backend server.

No, because haproxy is startet as root and will adjust ulimits on its own (based on maxconn). If it can’t for some reason, you will see an error message. So you generally do not have to worry about ulimits.

I have updated that netstat file netstat_updated

Ok, so here actually we see connections to the backend pile up in CLOSE_WAIT state (and about the same amount of connections stay in FIN_WAIT2 on the frontend), which is actually worse.

Do you see haproxy consuming abnormal amount of CPU during those issues? Is this still happening with all the suggestions regarding timeout and maxconn from above?

1.5.18 is the release that CentOs 7 ships with, and it’s very old. CLOSE_WAIT sockets indicate a serious problem within haproxy, as it doesn’t not close sockets that it is supposed to close. This may be an indicator for a bug, however CentOs7/haproxy 1.5.18 is used in many setups, so it’s strange that we are facing this.

CPU consumption is not high during high backend sessions.

found that one of the cluster user is running some REST API to process 1.5 billion records of logging data. Due to the 120m timeout proxy or no entry for FIN_WAIT2 in haproxy.cfg it might be getting crash. I applied changes to the haproxy and user has not run that application in the weekend and not crashed so far.

Is there anyway to control the throughput of the load balancer ? like allow only certain amount of backend sessions to pass on, I understand that it is not a well designed application code. I dont want to ask that user to check the app, wondering if we can control this ?

Thank you a ton!!