Haproxy 1.8 stuck - 80% dropped connections

happy · December 26, 2018, 5:28am

After some days or weeks normal working Haproxy starts to ignore 30-80% new connections

No kernel-level errors (even local connections affected)
No problems with any counters (“clear counters” cli-command do nothing) or fd, sockets and other limits
Nothing interesting in haproxy stats
It is rare
It happens randomly, but have obvious connection to significant load and low (but not overloaded) CPU.
Appears only on multi-process configs (with extremely often reloads and huge configs)
Tested up to 1.8.15 (Debian 8, kernel 4.17+)
Fixed easily by restart

TCP dump for normal request:

19:26:30.457523 IP localhost.59792 > localhost.https: Flags [S], seq 765380153, win 43690, options [mss 65495,sackOK,TS val 448339641 ecr 0,nop,wscale 10], length 0
19:26:30.457549 IP localhost.https > localhost.59792: Flags [S.], seq 2803598966, ack 765380154, win 43690, options [mss 65495,sackOK,TS val 448339641 ecr 448339641,nop,wscale 10], length 0
19:26:30.457596 IP localhost.59792 > localhost.https: Flags [.], ack 1, win 43, options [nop,nop,TS val 448339641 ecr 448339641], length 0
19:26:30.457827 IP localhost.59792 > localhost.https: Flags [P.], seq 1:189, ack 1, win 43, options [nop,nop,TS val 448339641 ecr 448339641], length 188
19:26:30.458425 IP localhost.https > localhost.59792: Flags [P.], seq 1:2749, ack 189, win 44, options [nop,nop,TS val 448339642 ecr 448339641], length 2748
19:26:30.458457 IP localhost.59792 > localhost.https: Flags [.], ack 2749, win 171, options [nop,nop,TS val 448339642 ecr 448339642], length 0
19:26:30.460073 IP localhost.59792 > localhost.https: Flags [P.], seq 189:282, ack 2749, win 171, options [nop,nop,TS val 448339643 ecr 448339642], length 93
19:26:30.460602 IP localhost.https > localhost.59792: Flags [P.], seq 2749:2800, ack 282, win 44, options [nop,nop,TS val 448339644 ecr 448339643], length 51
19:26:30.460720 IP localhost.59792 > localhost.https: Flags [P.], seq 282:388, ack 2800, win 171, options [nop,nop,TS val 448339644 ecr 448339644], length 106
19:26:30.460958 IP localhost.https > localhost.59792: Flags [P.], seq 2800:3017, ack 388, win 44, options [nop,nop,TS val 448339644 ecr 448339644], length 217
19:26:30.460994 IP localhost.https > localhost.59792: Flags [P.], seq 3017:3048, ack 388, win 44, options [nop,nop,TS val 448339644 ecr 448339644], length 31
19:26:30.461016 IP localhost.https > localhost.59792: Flags [F.], seq 3048, ack 388, win 44, options [nop,nop,TS val 448339644 ecr 448339644], length 0
19:26:30.461022 IP localhost.59792 > localhost.https: Flags [.], ack 3048, win 176, options [nop,nop,TS val 448339644 ecr 448339644], length 0
19:26:30.461065 IP localhost.59792 > localhost.https: Flags [P.], seq 388:419, ack 3049, win 176, options [nop,nop,TS val 448339644 ecr 448339644], length 31
19:26:30.461137 IP localhost.59792 > localhost.https: Flags [F.], seq 419, ack 3049, win 176, options [nop,nop,TS val 448339644 ecr 448339644], length 0
19:26:30.461142 IP localhost.https > localhost.59792: Flags [R.], seq 3049, ack 419, win 44, options [nop,nop,TS val 448339644 ecr 448339644], length 0
19:26:30.461161 IP localhost.https > localhost.59792: Flags [R], seq 2803602015, win 0, length 0

TCP dump for failed request:

19:27:27.576816 IP localhost.65360 > localhost.https: Flags [S], seq 578543461, win 43690, options [mss 65495,sackOK,TS val 448396759 ecr 0,nop,wscale 10], length 0
19:27:27.576842 IP localhost.https > localhost.65360: Flags [S.], seq 3967261380, ack 578543462, win 43690, options [mss 65495,sackOK,TS val 448396759 ecr 448396759,nop,wscale 10], length 0
19:27:27.576867 IP localhost.65360 > localhost.https: Flags [.], ack 1, win 43, options [nop,nop,TS val 448396759 ecr 448396759], length 0
19:27:27.577082 IP localhost.65360 > localhost.https: Flags [P.], seq 1:189, ack 1, win 43, options [nop,nop,TS val 448396759 ecr 448396759], length 188
19:27:27.617581 IP localhost.https > localhost.65360: Flags [.], ack 189, win 44, options [nop,nop,TS val 448396799 ecr 448396759], length 0
…client-side timeout - 30 seconds…
19:27:57.606808 IP localhost.65360 > localhost.https: Flags [F.], seq 189, ack 1, win 43, options [nop,nop,TS val 448426787 ecr 448396799], length 0
19:27:57.649591 IP localhost.https > localhost.65360: Flags [.], ack 190, win 44, options [nop,nop,TS val 448426831 ecr 448426787], length 0

lukastribus · December 26, 2018, 9:41am

Can you share the output of haproxy -vv, parts of your configuration, and what’s the most important information of it all: enable logging to a syslog server and share the logs when this happens.

happy · December 29, 2018, 1:52am

HA-Proxy version 1.8.15 2018/12/13
Copyright 2000-2018 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = native
CC = gcc
CFLAGS = -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_THREAD=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_TFO=1 USE_NS=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.1a 20 Nov 2018
Running on OpenSSL version : OpenSSL 1.1.1a 20 Nov 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : yes
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with network namespace support.

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

happy · December 29, 2018, 3:20am

Looks like this is the reason (check only on info-level for now):

02:04:14 haproxy [WARNING] 362/020413 (12093) : Failed to get the number of sockets to be transferred !
02:04:14 haproxy [ALERT] 362/020413 (12093) : Failed to get the sockets from the old process!

Related cfg:

stats socket /run/haproxy.sock level admin expose-fd listeners process 1

Reloads with systemd, almost default service file:

EnvironmentFile=-/etc/default/haproxy
Environment=“CONFIG=/etc/haproxy/haproxy.cfg” “PIDFILE=/run/haproxy.pid”
ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE $EXTRAOPTS
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always
Type=notify
Nice=-19
LimitNOFILE=524288

lukastribus · December 29, 2018, 12:16pm

Share the entire configuration please.
What does the LimitNOFILE= directive do in there? Please remove it.
How often are you reloading haproxy exactly and how do you know that you are now reloading haproxy while the previous reload is still ongoing?

happy · December 30, 2018, 12:16am

NOFILE does nothing anyway. It was a test before I started using “ulimit-n 10000000”, because haproxy has problems with constantly growing fd-amount due to descriptors that I create in Lua (they growing each reload)
The entire config is 500KB cfg + 10-15MB lua. I’ll prepare parts if it will be needed.
Reloads - every few minutes
If you understand reload as “start new processes, init, take old data, start listening”. It takes a small time and never intersects with another reloads.
However, each reload could leave some amount of old processes that could stay many hours (because of long connections) and even days (because of bugs, probably).
Since it related to conditions “low CPU + many processes + many reloads + big configs + noticeable load + my Lua-FD fun”, I think the reason is constant limit that unreachable in more normal environment.

github.com

haproxy/haproxy/blob/v1.8.0/src/haproxy.c#L977




static int get_old_sockets(const char *unixsocket)
{
	char *cmsgbuf = NULL, *tmpbuf = NULL;
	int *tmpfd = NULL;
	struct sockaddr_un addr;
	struct cmsghdr *cmsg;
	struct msghdr msghdr;
	struct iovec iov;
	struct xfer_sock_list *xfer_sock = NULL;
	struct timeval tv = { .tv_sec = 1, .tv_usec = 0 };
	int sock = -1;
	int ret = -1;
	int ret2 = -1;
	int fd_nb;
	int got_fd = 0;
	int i = 0;
	size_t maxoff = 0, curoff = 0;


	memset(&msghdr, 0, sizeof(msghdr));
	cmsgbuf = malloc(CMSG_SPACE(sizeof(int)) * MAX_SEND_FD);

According to get_old_sockets() the error happens after successful connection to old process.
And this timeout “tv” (hard limited to 1 second) is used for receiving fd-list from old process by “recvmsg(sock, &msghdr, MSG_WAITALL)”.
There is no exception processing around and if this function reaches 1sec-timeout it triggers exactly that error message

lukastribus · December 30, 2018, 11:25am

I’m not a LUA or LUA in haproxy expert, but it is my understanding that accessing the filesystem via LUA must not be done at all at runtime, as it will block the event-loop and cause this exact issues.

You can find more information’s about this in Haproxy LUA Architecture: Non blocking design.

If LUA doesn’t yield because it is doing a blocking systemcall (like accessing the filesystem), I don’t see how haproxy can run reliably in production. It will work with a bunch of connections in lab tests, sure, but production performance and reliability will be bad.

You can try to bump the socket timeout, but this means that LUA blocked the entire haproxy process for at least 1 second, meaning all the connections for that process stalled for that entire time.

Whether we hit a buggy code path when this timeout occurs that ultimately leads to dropped incoming connections is something we have to clarify and possibly fix, but you shouldn’t be hitting this timeout in the first place.

Regarding the former: could you explain the situation on the mailing list, CC’ing the LUA and master-worker maintainers that you can find in the MAINTAINERS file. The developers don’t read this discourse forum here.

Thanks

happy · December 30, 2018, 12:41pm

LUA doesn’t use file system during runtime. Memory-mapped fd-s are also fd-s )
It’s not LUA who holds 1sec. Just summary “low resources + a lot of fd-s + some load around” = socket transfer operation might exceed 1sec in some rare cases.
I’ve made few threads in mailing list, included question about such fd-s activity in LUA, but looks like it were too stupid questions for developers - all left ignored.
So, I’ll just try to recompile haproxy with increased timeout.

Thank you

lukastribus · December 30, 2018, 10:15pm

Can you elaborate which threads and when you posted?

happy · December 31, 2018, 3:57pm

Not too long testing for now, but it looks like extended timeout for receiving old sockets really solves problem and makes no harm
Thread about Lua FD. Wrong place?)

lukastribus · January 2, 2019, 1:57pm

It’s the right place, but not every developer always reads every thread, so it happens that some of those fall through the cracks.

I’ve CC’ed the LUA maintainer on this thread now.

Topic		Replies	Views
100+ haproxy processes Help!	15	1214	April 3, 2019
HAProxy stops serving frontend requests while not closing backend connections at 100% cpu utilisation Help!	10	7076	August 7, 2019
Large Conntrack/Active connections count. FD Limit Reached Help!	7	2085	March 11, 2021
HAProxy 1.5.18 Centos 6 CPU Issue Help!	39	2472	January 23, 2018
Understanding maxconn and maxonnrate and delays Help!	12	7262	February 4, 2019

Haproxy 1.8 stuck - 80% dropped connections

Related topics