Maxconn causing close_wait

We have a load balancer with below configuration -

frontend lb
        bind x.x.x.x:80 mss 1440 alpn h2,http/1.1
        maxconn 100
        mode http
        option httplog
        use_backend busy_backend if { fe_conn ge 100 } || { be_conn_free(actual_backend) le 0 } actual_backend

backend actual_backend
        mode  http
        option httpchk
        http-check send meth GET uri /some/path ver HTTP/1.1 hdr Host some.header.com expect status 200
        server my-server01 x.x.x.x:80 check port 80 maxconn 10000 enabled maxqueue 1
        server my-server02 x.x.x.x:80 check port 80 maxconn 10000 enabled maxqueue 1
        errorfile 503 /etc/path/busy/busy.http

backend busy_backend
        mode http
        option httpchk
        http-check send meth GET uri /path ver HTTP/1.1
        http-check expect status 200
        server busy-server unix@socket.sock enabled backup
        errorfile 503 /etc/path/busy/busy.http

We have configured the lb to take only maximum of 100 conncurr connections at a time by setting 100 as maxconn value.

I start hitting the lb with 120 connections, now I expect to see haproxy to accept 100 connections and rest of the connections should be given with 503.

root@load_test_server:~# ./hey_linux_amd64 -n 1000000 -c 120 -q 30 http://lb-ip/path

The connections reaches to 100 in haproxy for this listener.

once it does, all we see is 503s.

What we suspect it, the haproxy is not closing the connections and keeps them on close-wait state for a very long time.

Number of close_waits on hap

root@haproxy:~# date; ss -antp | grep listener-ip | grep -i close | wc -l
Fri Feb  5 15:47:46 UTC 2021
60

Take one close-wait as example

root@haproxy:~# date; ss -antp | grep listener-ip | grep -i close | tail -1
Fri Feb  5 15:48:00 UTC 2021
CLOSE-WAIT 126      0                                      listener-ip:80                                          load-generator-ip:59280

Now I search for this port’s status in the load generating server, I do not see this port being used.

root@load-gen-server:~# date; ss -antp | grep 59280
Fri Feb  5 15:49:56 UTC 2021

But this connection in haproxy stays for a longer time, until I stop the traffic to the frontend.

root@haproxy:~# date; ss -antp | grep grep listener-ip | grep -i 59280
Fri Feb  5 15:51:10 UTC 2021
CLOSE-WAIT 126      0                                      grep listener-ip:80                                          load-generator-ip:59280

It is in CLOSE-WAIT for almost 3 minutes.

Our understanding is, technically this close_wait connection is a part of the number of connections we set in maxconn hence the listener stops serving any traffic until it finds a free connection.

We tested this in haproxy 2.2.6 and 2.2.3, both behaves the same way.

Can you kindly help in debug this issue?

haproxy -vv

  root@haproxy:~# haproxy -vv
  HA-Proxy version 2.2.3-0e58a34 2020/09/08 - https://haproxy.org/
  Status: long-term supported branch - will stop receiving fixes around Q2 2025.
  Known bugs: http://www.haproxy.org/bugs/bugs-2.2.3.html
  Running on: Linux 4.15.0-42-generic #45-Ubuntu SMP Thu Nov 15 19:32:57 UTC 2018 x86_64
  Build options :
    TARGET  = linux-glibc
    CPU     = generic
    CC      = gcc
    CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-stringop-overflow -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
    OPTIONS = USE_PCRE=1 USE_LINUX_TPROXY=1 USE_LINUX_SPLICE=1 USE_LIBCRYPT=1 USE_OPENSSL=1 USE_ZLIB=1 USE_SYSTEMD=1

  Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

  Default settings :
    bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

  Built with multi-threading support (MAX_THREADS=64, default=16).
  Built with OpenSSL version : OpenSSL 1.1.0g  2 Nov 2017
  Running on OpenSSL version : OpenSSL 1.1.1h  22 Sep 2020 (VERSIONS DIFFER!)
  OpenSSL library supports TLS extensions : yes
  OpenSSL library supports SNI : yes
  OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
  Built with network namespace support.
  Built with zlib version : 1.2.11
  Running on zlib version : 1.2.11
  Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
  Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
  Built with PCRE version : 8.39 2016-06-14
  Running on PCRE version : 8.39 2016-06-14
  PCRE library supports JIT : no (USE_PCRE_JIT not set)
  Encrypted password support via crypt(3): yes
  Built with gcc compiler version 7.3.0
  Built with the Prometheus exporter as a service

  Available polling systems :
        epoll : pref=300,  test result OK
         poll : pref=200,  test result OK
       select : pref=150,  test result OK
  Total: 3 (3 usable), will use epoll.

  Available multiplexer protocols :
  (protocols marked as <default> cannot be specified using 'proto' keyword)
              fcgi : mode=HTTP       side=BE        mux=FCGI
         <default> : mode=HTTP       side=FE|BE     mux=H1
                h2 : mode=HTTP       side=FE|BE     mux=H2
         <default> : mode=TCP        side=FE|BE     mux=PASS

  Available services :
  	prometheus-exporter

  Available filters :
  	[SPOE] spoe
  	[COMP] compression
  	[TRACE] trace
  	[CACHE] cache
  	[FCGI] fcgi-app

Global

  global

  user haproxy
  group haproxy
  nbproc 1
  nbthread 16
  cpu-map auto:1/1-16 0-15
  log /dev/log local2
  log /dev/log local0 notice
  chroot /path
  pidfile /path/haproxy.pid
  daemon
  master-worker
  maxconn 200000
  hard-stop-after 1h
  stats socket /path/stats mode 660 level admin expose-fd listeners
  tune.ssl.cachesize 3000000
  tune.ssl.lifetime 60000
  ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
  ssl-default-bind-options ssl-min-ver TLSv1.2 ssl-max-ver TLSv1.2
  server-state-file /path/states
  tune.bufsize 40960

Defaults

defaults

  mode http
  log global
  retries 3
  timeout http-request 10s
  timeout queue 10s
  timeout connect 10s
  timeout client 1m
  timeout server 1m
  timeout tunnel 10m
  timeout client-fin 30s
  timeout server-fin 30s
  timeout check 10s
  option httplog
  option forwardfor except 127.0.0.0/8
  option redispatch
  load-server-state-from-file global

No, that is not how it works.

If you set maxconn 100 on the frontend, haproxy will stop accepting new sockets from the kernel and they will queue up on the OS side.

A 503 can only be send with an established and accepted TCP connection. Not accepting TCP connection means not being able to send traffic to the other side.

If you configure maxconn 100 for a backend server on the other hand, then haproxy will queue the requests and when timeout queue hits it would emit appropriate HTTP responses (IF and only if the frontend maxconn configuration is larger than that, otherwise requests are not accepted there anyway).

So it’s not a leak. Does haproxy use 100% of the CPU during this benchmark?

@lukastribus yes, thanks for the clarification.

The CPU is pretty normal, trending at 14% average for 16 cpus.

With 100 as value in maxconn, this is the load test result.

root@load-gen-server:~# ./hey_linux_amd64 -n 1000000 -c 120 -q 50 http://listener-ip/path

Summary:
  Total:	346.4833 secs
  Slowest:	0.9997 secs
  Fastest:	0.0003 secs
  Average:	0.0041 secs
  Requests/sec:	2886.0268

  Total data:	1591823952 bytes
  Size/request:	1592 bytes

Response time histogram:
  0.000 [1]	|
  0.100 [999776]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.200 [1]	|
  0.300 [0]	|
  0.400 [0]	|
  0.500 [0]	|
  0.600 [0]	|
  0.700 [0]	|
  0.800 [0]	|
  0.900 [0]	|
  1.000 [2]	|


Latency distribution:
  10% in 0.0016 secs
  25% in 0.0029 secs
  50% in 0.0043 secs
  75% in 0.0053 secs
  90% in 0.0062 secs
  95% in 0.0067 secs
  99% in 0.0083 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0000 secs, 0.0003 secs, 0.9997 secs
  DNS-lookup:	0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0017 secs
  resp wait:	0.0041 secs, 0.0003 secs, 0.9996 secs
  resp read:	0.0000 secs, 0.0000 secs, 0.0030 secs

Status code distribution:
  [200]	116898 responses
  [503]	882882 responses

Less than 15% of total requests got 200s and rest are sent to busy and received 503s.

I don’t see anything wrong with it. With this benchmark configuration you are basically measuring TCP handshakes on both sides and the results are as expected.

What is it that you would like to see?

1 Like

@lukastribus all good, thanks you for the explanation on this thread !