Hey all, I’m currently trying to migrate my servers from NGINX to HAProxy but on restarting the proxies with the new configuration, the conntrack and active connection count skyrockets to around 600k/20k respectively. I’ve been looking at this issue for a week and I have no idea how to proceed. I’ve looked at tcpdumps and other tools like ss but I honestly don’t know what to look for. The logs don’t really show anything. I haven’t tried yet to set them to a verbose mode as they generate so much garbage. Usually, Conntrack is hanging around 15k per server. also what is odd is that if one haproxy reloads the other proxies also spike around 600k in conntrack. what TH could be happening? Thanks for the help
config: global daemon maxconn 50000 user haproxy group haproxy - Pastebin.com (edited)
http-response del-header Connection
You are interfering with haproxy’s connection handling. Don’t do that. I know those crazy hacks (overwriting connection handling headers) are considered normal in the nginx world, but that is definitely not the case with haproxy.
http-response set-header Connection close if exceeded_connection reset
Here too, don’t do this. If you believe you need get crazy with connection headers later on, I will probably not be able to stop you, but please get your baseline numbers first without it. Your mileage will certainly vary, if you choose to do so.
Please provide the output of haproxy -vv
and from the ss
output, try to understand if there is a pattern, like, are most of the sockets in CLOSE_WAIT state? Are most of the sockets between the haproxy and the backend server, or between haproxy and the clients? Things like that could help narrow down the root cause.
Hi @lukastribus, Thank you for getting back to me. sorry I only saw this now.
Thank you for your advise on the connection header handling. This is the first haproxy instance we’re running so we were trying to emulate only NGINX set up. I’ll remove it and see how it performs.
I’m currently running HAProxy in Docker. here is a version list.
Haproxy: haproxy:2.3.4-alpine
Docker: Docker version 17.04.0-ce, build 4845c56
host: Ubuntu 12.04.5 LTS
output of haproxy -vv
in image
Status: stable branch - will stop receiving fixes around Q1 2022.
Known bugs: http://www.haproxy.org/bugs/bugs-2.3.4.html
Running on: Linux 3.13.0-117-generic #164~precise1-Ubuntu SMP Mon Apr 10 16:16:25 UTC 2017 x86_64
Build options :
TARGET = linux-musl
CPU = generic
CC = cc
CFLAGS = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1
DEBUG =
Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_THREADS=64, default=24).
Built with OpenSSL version : OpenSSL 1.1.1i 8 Dec 2020
Running on OpenSSL version : OpenSSL 1.1.1i 8 Dec 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.6
Built with network namespace support.
Built with the Prometheus exporter as a service
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.36 2020-12-04
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 10.2.1 20201203
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2
fcgi : mode=HTTP side=BE mux=FCGI
<default> : mode=HTTP side=FE|BE mux=H1
<default> : mode=TCP side=FE|BE mux=PASS
Available services : prometheus-exporter
Available filters :
[SPOE] spoe
[CACHE] cache
[FCGI] fcgi-app
[COMP] compression
[TRACE] trace
Thank for the ss tip. I’ll look at the output when the issue occurs again. In the last couple of weeks, I have noticed a pattern. If I restart the proxies metrics like conntrack_count
and active connections
are low and are at a reasonable level[1]. The issue pops up when a backend server crashes. After this happens these two metrics are several orders of magnitude higher than normal(see time prior to 6am [1]). These large counts are always preceded by a crashing server in the backend and a large connection spike[2] and a bunch of TCP errors[3].
I’ll have a look again at the ss output and see what I can find there. It appears to me(not sure though) that once an old server crashes, the stale connections in haproxy are not cleaned up. Although that wouldn’t explain why the conntrack count is so high on the machines.
Figures
figure 1
figure 2
Figure 3
Please share the configuration you are currently using, including the server parameters and health checks.
Also share haproxy logs at the time of the server crash.
Here is the configuration file.
global
daemon
maxconn 150000
user haproxy
group haproxy
log 127.0.0.1:514 local0 notice
stats socket /var/run/haproxy.sock expose-fd listeners
# each conn is around 200byte, thus we reserve 200mb for ssl caching
# below we allow 1,000,000 connections to be cached
tune.ssl.cachesize 1000000
nbproc 1
nbthread 22
cpu-map auto:1/1-22 0-21
master-worker
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5s
timeout check 5s
timeout client 30s
timeout server 30s
timeout http-keep-alive 10s
option http-keep-alive
frontend stats
bind <%= scope.function_interface_by_tag(['public', 'address']) %>:8999
bind *:8999
mode http
stats enable
stats uri /
frontend site
maxconn 25000
bind *:9022 ssl crt /etc/ssl/private/haproxy.pem alpn h2,http/1.1
mode http
stick-table type string size 10k store gpc0
http-request set-var(sess.src_port) src_port
http-request set-var(sess.source) src,concat(:,sess.src_port)
http-request track-sc0 var(sess.source)
http-request sc-inc-gpc0
acl exceeded_connection sc0_get_gpc0 ge 10000
acl reset sc0_clr_gpc0 ge 0
http-response set-header Connection close if exceeded_connection reset
acl is_authorized hdr(Authorization) token
http-request deny if !is_authorized
default_backend site
backend site
balance roundrobin
http-reuse always
mode http
option tcp-check
option srvtcpka
srvtcpka-intvl 10s
srvtcpka-cnt 3
<%- for i in 1..36 -%>
server node-<%= i.to_s.rjust(2, '0') %> node-<%= i.to_s.rjust(2, '0') %> check port 3030 weight 100 alpn http/1.1
<%- end -%>
frontend site
maxconn 25000
bind *:9031
mode http
stick-table type string size 10k store gpc0
http-request set-var(sess.src_port) src_port
http-request set-var(sess.source) src,concat(:,sess.src_port)
http-request track-sc0 var(sess.source)
http-request sc-inc-gpc0
acl exceeded_connection sc0_get_gpc0 ge 10000
acl reset sc0_clr_gpc0 ge 0
http-response set-header Connection close if exceeded_connection reset
default_backend site
backend site
balance roundrobin
http-reuse always
mode http
option tcp-check
option srvtcpka
srvtcpka-intvl 10s
srvtcpka-cnt 3
<%- for i in 1..36 -%>
server node-<%= i.to_s.rjust(2, '0') %> node-<%= i.to_s.rjust(2, '0') %> check port 3030 weight 100 alpn http/1.1
<%- end -%>
frontend site
maxconn 40000
bind *:9042 ssl crt /etc/ssl/private/haproxy.pem
mode http
stick-table type string size 10k store gpc0
http-request set-var(sess.src_port) src_port
http-request set-var(sess.source) src,concat(:,sess.src_port)
http-request track-sc0 var(sess.source)
http-request sc-inc-gpc0
acl exceeded_connection sc0_get_gpc0 ge 10000
acl reset sc0_clr_gpc0 ge 0
http-response set-header Connection close if exceeded_connection reset
default_backend site
backend site
balance roundrobin
http-reuse always
mode http
option httpchk GET /health
http-check expect status 200
option srvtcpka
srvtcpka-intvl 10s
srvtcpka-cnt 3
<%- for i in 1..36 -%>
server node-<%= i.to_s.rjust(2, '0') %> node-<%= i.to_s.rjust(2, '0') %> check port 3030 weight 100 alpn http/1.1
<%- end -%>
frontend site
maxconn 25000
bind *:9091 ssl crt /etc/ssl/private/haproxy.pem
mode http
stick-table type string size 10k store gpc0
http-request set-var(sess.src_port) src_port
http-request set-var(sess.source) src,concat(:,sess.src_port)
http-request track-sc0 var(sess.source)
http-request sc-inc-gpc0
acl exceeded_connection sc0_get_gpc0 ge 10000
acl reset sc0_clr_gpc0 ge 0
http-response set-header Connection close if exceeded_connection reset
default_backend site
backend site
balance roundrobin
http-reuse always
mode http
option tcp-check
option srvtcpka
srvtcpka-intvl 10s
srvtcpka-cnt 3
<%- for i in 1..36 -%>
server node-<%= i.to_s.rjust(2, '0') %> node-<%= i.to_s.rjust(2, '0') %> check port 3030 weight 100 alpn http/1.1
<%- end -%>
Errors in the logs are solely just copies of this error
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Mar 9 07:48:02 127.0.0.1 haproxy: Proxy [frontend] reached process FD limit (maxsock=300398). Please check 'ulimit-n' and restart. {} " http:// "
Lastly, I should say there is currently an open ticket in Github that also seems to demonstrate this issue as well. https://github.com/haproxy/haproxy/issues/136 and Backend connection leak after connection failures · Issue #1003 · haproxy/haproxy · GitHub.
Here is the output of ss -a | awk '{print $1}' | sort | uniq -c
266412 ESTAB
1 FIN-WAIT-1
54 FIN-WAIT-2
21 LISTEN
1 State
2 SYN-RECV
34 SYN-SENT
1860 TIME-WAIT
The operating system is completely obsolete and the kernel is at least 4 years old. Considering that we are talking about a socket issue, the kernel may very well play a role here. I strongly suggest you upgrade the operating system to a supported one.
You are also missing 4 years of security fixes, so this is something you will have to do anyway.
I don’t think the github issues are related, #136 was fixed in 2.0.5 and the other one I’m not sure I see the same issue.
@lukastribus Unfortunately I have the constraint that I am not allowed to perform a system os upgrade. However, I did solve my issue by downgrading my haproxy to 2.1. Not sure why this helps but it solved my problem in the meantime. Thank you for your help