Haproxy Memory Issue : its old process have occupied hefty memory


#1

A day before we have taken reload for haproxy and next day we noticed that old process still occupying 4GB memory

~~Trace logs: Top 4 memory occupied processes are :
root@ip-10-0-x-xx:~# ps ax -o rss,user,command | sort -nr | head -n 4
8799424 haproxy /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 630
4918408 haproxy /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 27615
86292 ganglia /usr/sbin/gmond --pid-file=/var/run/ganglia-monitor.pid
29628 syslog rsyslogd

root@ip-10-0-x-xx:~#ps -ef | grep ha
haproxy 630 1 19 Jul28 ? 5-11:40:10 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 27615
haproxy 13810 1 19 Aug24 ? 03:51:22 /usr/local/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 630

Note:
processid :630 is old one which have occupied approx 4GB of ram
processid :13810 is latest one which have occupied approx 8GB of ram

Then we have checked that old id still have some few connection established :
root@ip-10-0-x-xx:~# lsof -p 630| grep TCP |wc -l
104
Note: 104 here means one for incoming and one for outgoing (i.e backend node)
i.e
root@ip-10-0-x-xx:~# lsof -p 630 | grep -v “ap-xxxx-y.compute.internal:xxxx” |grep TCP |wc -l
52

Please help me understand even with very less number of these connection on old process why it has occupied hefty memory


Memory not getting released
#2

Please help me in order to find out the rca


#3

Please use a single thread, not 3 threads for one problem. Crossposting is not getting you help any sooner, it will just led to wrong advice for the wrong people.

Post the output of “haproxy -vv” and the configuration (especially important are timeout values).


#4

ubuntu@ip-10-0-x-xx:~$ haproxy -vv
HA-Proxy version 1.5.18 2016/05/10
Copyright 2000-2016 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = native
CC = gcc
CFLAGS = -O2 -march=native -g -fno-strict-aliasing
OPTIONS = DLMALLOC_SRC=…/malloc.c USE_OPENSSL=1 USE_STATIC_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built without zlib support (USE_ZLIB not set)
Compression algorithms supported : identity
Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

ubuntu@ip-10-0-x-xx:~$

ubuntu@ip-10-0-x-xx:~$ cat /etc/haproxy/haproxy.cfg
global
maxconn 500000
user haproxy
group haproxy
daemon
tune.ssl.default-dh-param 2048
stats socket /var/run/haproxy.socket group ganglia mode 775 level operator

defaults
retries 3
option redispatch
maxconn 500000
timeout connect 300000
timeout client 660000
timeout server 660000

frontend inbound
rate-limit sessions 5000
mode tcp
bind *:port1
bind *:port2
bind *:port3
bind *:443 ssl crt /etc/ssl-cert-path/new-x.y.z.crt.pem
default_backend bk_servers

backend bk_servers
balance listen
server namexyz 10.0.x.yz:port check weight 100 inter 5000
ubuntu@ip-10-0-x-xx:~$


#5

Don’t use DLMALLOC unless you know exactly what you are doing. I have not seen a DLMALLOC build in a long time, it won’t certainly resolve your problems magically.

Seems to me the connection don’t timeout, I guess they continue to generate traffic so they don’t timeout.

I suggest you tcpdump those connections to see what kind of traffic still runs through those old processes (pick some of the ports from the lsof output and make a tcpdump filter for it).