We have a load balancer with below configuration -
frontend lb
bind x.x.x.x:80 mss 1440 alpn h2,http/1.1
maxconn 100
mode http
option httplog
use_backend busy_backend if { fe_conn ge 100 } || { be_conn_free(actual_backend) le 0 } actual_backend
backend actual_backend
mode http
option httpchk
http-check send meth GET uri /some/path ver HTTP/1.1 hdr Host some.header.com expect status 200
server my-server01 x.x.x.x:80 check port 80 maxconn 10000 enabled maxqueue 1
server my-server02 x.x.x.x:80 check port 80 maxconn 10000 enabled maxqueue 1
errorfile 503 /etc/path/busy/busy.http
backend busy_backend
mode http
option httpchk
http-check send meth GET uri /path ver HTTP/1.1
http-check expect status 200
server busy-server unix@socket.sock enabled backup
errorfile 503 /etc/path/busy/busy.http
We have configured the lb to take only maximum of 100 conncurr connections at a time by setting 100 as maxconn
value.
I start hitting the lb with 120 connections, now I expect to see haproxy to accept 100 connections and rest of the connections should be given with 503.
root@load_test_server:~# ./hey_linux_amd64 -n 1000000 -c 120 -q 30 http://lb-ip/path
The connections reaches to 100 in haproxy for this listener.
once it does, all we see is 503s.
What we suspect it, the haproxy is not closing the connections and keeps them on close-wait state for a very long time.
Number of close_waits on hap
root@haproxy:~# date; ss -antp | grep listener-ip | grep -i close | wc -l
Fri Feb 5 15:47:46 UTC 2021
60
Take one close-wait as example
root@haproxy:~# date; ss -antp | grep listener-ip | grep -i close | tail -1
Fri Feb 5 15:48:00 UTC 2021
CLOSE-WAIT 126 0 listener-ip:80 load-generator-ip:59280
Now I search for this port’s status in the load generating server, I do not see this port being used.
root@load-gen-server:~# date; ss -antp | grep 59280
Fri Feb 5 15:49:56 UTC 2021
But this connection in haproxy stays for a longer time, until I stop the traffic to the frontend.
root@haproxy:~# date; ss -antp | grep grep listener-ip | grep -i 59280
Fri Feb 5 15:51:10 UTC 2021
CLOSE-WAIT 126 0 grep listener-ip:80 load-generator-ip:59280
It is in CLOSE-WAIT
for almost 3 minutes.
Our understanding is, technically this close_wait connection is a part of the number of connections we set in maxconn hence the listener stops serving any traffic until it finds a free connection.
We tested this in haproxy 2.2.6 and 2.2.3, both behaves the same way.
Can you kindly help in debug this issue?
haproxy -vv
root@haproxy:~# haproxy -vv
HA-Proxy version 2.2.3-0e58a34 2020/09/08 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.2.3.html
Running on: Linux 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = gcc
CFLAGS = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-stringop-overflow -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
OPTIONS = USE_PCRE=1 USE_LINUX_TPROXY=1 USE_LINUX_SPLICE=1 USE_LIBCRYPT=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1
Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_THREADS=64, default=16).
Built with OpenSSL version : OpenSSL 1.1.0g 2 Nov 2017
Running on OpenSSL version : OpenSSL 1.1.1h 22 Sep 2020 (VERSIONS DIFFER!)
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 7.3.0
Built with the Prometheus exporter as a service
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
fcgi : mode=HTTP side=BE mux=FCGI
<default> : mode=HTTP side=FE|BE mux=H1
h2 : mode=HTTP side=FE|BE mux=H2
<default> : mode=TCP side=FE|BE mux=PASS
Available services :
prometheus-exporter
Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace
[CACHE] cache
[FCGI] fcgi-app
Global
global
user haproxy
group haproxy
nbproc 1
nbthread 16
cpu-map auto:1/1-16 0-15
log /dev/log local2
log /dev/log local0 notice
chroot /path
pidfile /path/haproxy.pid
daemon
master-worker
maxconn 200000
hard-stop-after 1h
stats socket /path/stats mode 660 level admin expose-fd listeners
tune.ssl.cachesize 3000000
tune.ssl.lifetime 60000
ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
ssl-default-bind-options ssl-min-ver TLSv1.2 ssl-max-ver TLSv1.2
server-state-file /path/states
tune.bufsize 40960
Defaults
defaults
mode http
log global
retries 3
timeout http-request 10s
timeout queue 10s
timeout connect 10s
timeout client 1m
timeout server 1m
timeout tunnel 10m
timeout client-fin 30s
timeout server-fin 30s
timeout check 10s
option httplog
option forwardfor except 127.0.0.0/8
option redispatch
load-server-state-from-file global