HAProxy constant downwards trend on memory usage

Hi HAProxy Community,

I have deployed HAProxy as a load balancer in front of two Bitbucket applications.
Production traffic has recently been diverted to hit HAProxy and I am seeing a worrying downwards trend.

Here is a screenshot of the memory usage, the flatline was during application and migration testing for Bitbucket. The downwards trend in memory starts very shortly after production traffic was routed to the new system:

Increasing the resources on this server is not completely out of the question, but I would prefer to get at the root cause of why so much memory is being used.

This is not an internet facing application so the traffic is relatively low.

Please let me know if this is actually an expected amount of memory usage or if I have misconfigured something to do with how memory is released by the application.

In the documentation I do see mention of the haproxy -m <MB> option but I am unsure if this is a safe release when memory limits are hit, is there any downtime incurred when the application realizes it needs to release a whole bunch of memory?

For additional information I have included the output of a couple of commands.

Results of: haproxy -vv

HA-Proxy version 2.2.4-de45672 2020/09/30 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2025.
Known bugs: http://www.haproxy.org/bugs/bugs-2.2.4.html
Running on: Linux 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Apr 8 19:51:47 UTC 2021 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
  OPTIONS = USE_PCRE=1 USE_PCRE_JIT=1 USE_THREAD=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_ZLIB=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE +PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=2).
Built with OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with network namespace support.
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 4.8.5 20150623 (Red Hat 4.8.5-39)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
            fcgi : mode=HTTP       side=BE        mux=FCGI
       <default> : mode=HTTP       side=FE|BE     mux=H1
              h2 : mode=HTTP       side=FE|BE     mux=H2
       <default> : mode=TCP        side=FE|BE     mux=PASS

Available services : none

Available filters :
        [SPOE] spoe
        [COMP] compression
        [TRACE] trace
        [CACHE] cache
        [FCGI] fcgi-app

Results of: show pools

Dumping pools usage. Use SIGQUIT to flush them.
  - Pool comp_state (32 bytes) : 51 allocated (1632 bytes), 48 used, needed_avg 41, 0 failures, 6 users, @0x556b4c5b0b00=09 [SHARED]
  - Pool filter (64 bytes) : 1682237 allocated (107663168 bytes), 1682237 used, needed_avg 1682232, 0 failures, 9 users, @0x556b4c5b0a80=08 [SHARED]
  - Pool pendconn (96 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 4 users, @0x556b4c5b0b80=10 [SHARED]
  - Pool ssl_sock_ct (128 bytes) : 209 allocated (26752 bytes), 86 used, needed_avg 168, 0 failures, 3 users, @0x556b4c5b0680=00 [SHARED]
  - Pool h1c (160 bytes) : 206 allocated (32960 bytes), 101 used, needed_avg 165, 0 failures, 4 users, @0x556b4c5b0800=03 [SHARED]
  - Pool fcgi_strm (192 bytes) : 177 allocated (33984 bytes), 85 used, needed_avg 142, 0 failures, 5 users, @0x556b4c5b0700=01 [SHARED]
  - Pool tcpcheck_ru (224 bytes) : 74 allocated (16576 bytes), 73 used, needed_avg 63, 0 failures, 3 users, @0x556b4c5b0980=06 [SHARED]
  - Pool authority (256 bytes) : 2 allocated (512 bytes), 1 used, needed_avg 0, 0 failures, 2 users, @0x556b4c5b0d00=13 [SHARED]
  - Pool spoe_ctx (320 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0a00=07 [SHARED]
  - Pool dns_resolut (480 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0c00=11 [SHARED]
  - Pool dns_answer_ (576 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0c80=12 [SHARED]
  - Pool requri (1024 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0e00=15 [SHARED]
  - Pool stream (1088 bytes) : 105 allocated (114240 bytes), 29 used, needed_avg 93, 0 failures, 1 users, @0x556b4c5b0900=05 [SHARED]
  - Pool fcgi_conn (1216 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0780=02 [SHARED]
  - Pool h2c (1312 bytes) : 11 allocated (14432 bytes), 8 used, needed_avg 10, 0 failures, 1 users, @0x556b4c5b0880=04 [SHARED]
  - Pool hpack_tbl (4096 bytes) : 11 allocated (45056 bytes), 8 used, needed_avg 10, 0 failures, 1 users, @0x556b4c5b0f00=17 [SHARED]
  - Pool buffer (16384 bytes) : 66 allocated (1081344 bytes), 46 used, needed_avg 60, 0 failures, 1 users, @0x556b4c5b0e80=16 [SHARED]
  - Pool trash (16416 bytes) : 2 allocated (32832 bytes), 2 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0f80=18
Total: 18 pools, 109063488 bytes allocated, 108586112 used.

Haproxy config

global
    maxconn                     4000
    stats socket                REDACTED mode 600 level admin
    log                         REDACTED
    chroot                      REDACTED
    pidfile                     REDACTED
    user                        haproxy
    group                       haproxy
    daemon
    ssl-default-bind-ciphers    REDACTED
    ssl-default-bind-options    REDACTED
    ssl-default-server-ciphers  REDACTED
    ssl-default-server-options  REDACTED
    ssl-dh-param-file           REDACTED

defaults
   option                       dontlognull
   option                       redispatch
   retries                      3
   timeout http-request         10s
   timeout queue                1m
   timeout connect              10s
   timeout client               1m
   timeout server               1m
   timeout http-keep-alive      10s
   timeout check                10s
   maxconn                      3000
   errorfile                    408 /dev/null   # Workaround for Chrome 35-36 bug.  See http://blog.haproxy.com/2014/05/26/haproxy-and-http-errors-408-in-chrome/


frontend bitbucket_frontend_http
    bind                        *:REDACTED
    bind                        *:REDACTED ssl crt REDACTED
    default_backend             bitbucket_backend_http

backend bitbucket_backend_http
    mode                        http
    balance                     roundrobin
    option                      httpchk GET /status
    option                      forwardfor
    option                      http-server-close
    cookie                      BITBUCKETSESSIONID prefix
    stick-table type            string len 52 size 5M expire 30m
    stick store-response        set-cookie(BITBUCKETSESSIONID)
    stick on                    cookie(BITBUCKETSESSIONID)
    http-request                redirect scheme https unless { ssl_fc }
    server                      REDACTED check inter 10000 rise 2 fall 5
    server                      REDACTED check inter 10000 rise 2 fall 5


frontend bitbucket_frontend_ssh
    bind                        *:REDACTED
    default_backend             bitbucket_backend_ssh
    timeout client              15m
    maxconn                     50

backend bitbucket_backend_ssh
    mode                        tcp
    balance                     roundrobin
    server                      REDACTED check port REDACTED
    server                      REDACTED check port REDACTED
    timeout server              15m

listen admin
    mode                        http
    bind                        *:REDACTED
    stats                       enable
    stats auth                  REDACTED
    stats uri                   REDACTED

The show pools output is before the restart with maximum memory consumption or just after a restart so showing a fresh haproxy instance? Because it only shows 109 MB of used memory.

According to the bug page:
http://www.haproxy.org/bugs/bugs-2.2.4.html

release 2.2.4 is affected by 5 memory leaks. I would strongly suggest to upgrade to the latest bugfix release, which as is currently 2.2.14.

I advise against artificially limiting the memory consumption with the -m parameter: that’s a feature to test behavior under memory pressure, I don’t think it’s a good idea to use in for production traffic. The root cause of the memory consumption should be addressed.

Often those issues are caused by huge timeout and maxconn values in the configuration, so connections are never released, however that is not the case here.

Sorry yeah, it has just occurred to me that I recorded the show pools after it had been restarted so it was a fresh instance of HAProxy.

As expected I did come in today and find that over the weekend the instance has churned through some memory:

Dumping pools usage. Use SIGQUIT to flush them.
  - Pool comp_state (32 bytes) : 72 allocated (2304 bytes), 33 used, needed_avg 66, 0 failures, 6 users, @0x556b4c5b0b00=09 [SHARED]
  - Pool filter (64 bytes) : 7032752 allocated (450096128 bytes), 7032752 used, needed_avg 3448519, 0 failures, 9 users, @0x556b4c5b0a80=08 [SHARED]
  - Pool pendconn (96 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 4 users, @0x556b4c5b0b80=10 [SHARED]
  - Pool ssl_sock_ct (128 bytes) : 204 allocated (26112 bytes), 67 used, needed_avg 164, 0 failures, 3 users, @0x556b4c5b0680=00 [SHARED]
  - Pool h1c (160 bytes) : 209 allocated (33440 bytes), 88 used, needed_avg 168, 0 failures, 4 users, @0x556b4c5b0800=03 [SHARED]
  - Pool fcgi_strm (192 bytes) : 204 allocated (39168 bytes), 72 used, needed_avg 164, 0 failures, 5 users, @0x556b4c5b0700=01 [SHARED]
  - Pool tcpcheck_ru (224 bytes) : 82 allocated (18368 bytes), 45 used, needed_avg 67, 0 failures, 3 users, @0x556b4c5b0980=06 [SHARED]
  - Pool authority (256 bytes) : 2 allocated (512 bytes), 1 used, needed_avg 0, 0 failures, 2 users, @0x556b4c5b0d00=13 [SHARED]
  - Pool spoe_ctx (320 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0a00=07 [SHARED]
  - Pool dns_resolut (480 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0c00=11 [SHARED]
  - Pool dns_answer_ (576 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0c80=12 [SHARED]
  - Pool requri (1024 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0e00=15 [SHARED]
  - Pool stream (1088 bytes) : 74 allocated (80512 bytes), 23 used, needed_avg 60, 0 failures, 1 users, @0x556b4c5b0900=05 [SHARED]
  - Pool fcgi_conn (1216 bytes) : 0 allocated (0 bytes), 0 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0780=02 [SHARED]
  - Pool h2c (1312 bytes) : 11 allocated (14432 bytes), 7 used, needed_avg 10, 0 failures, 1 users, @0x556b4c5b0880=04 [SHARED]
  - Pool hpack_tbl (4096 bytes) : 11 allocated (45056 bytes), 7 used, needed_avg 10, 0 failures, 1 users, @0x556b4c5b0f00=17 [SHARED]
  - Pool buffer (16384 bytes) : 66 allocated (1081344 bytes), 45 used, needed_avg 54, 0 failures, 1 users, @0x556b4c5b0e80=16 [SHARED]
  - Pool trash (16416 bytes) : 2 allocated (32832 bytes), 2 used, needed_avg 0, 0 failures, 1 users, @0x556b4c5b0f80=18
Total: 18 pools, 451470208 bytes allocated, 450976992 used.

It seems to be the filter pool that is holding onto a lot of memory at the moment. Please if you do read through the above logs let me know if you see anything interesting.

Thanks for the versioning info, I will investigate upgrading our instance to 2.2.14 ASAP.

Thanks!

Is there any filter functionality that you are actually using? It doesn’t look so from the configuration you shared.

Filter functionality is used for SPOE, compression, caching, fcgi-app, trace.

No as far as I am aware I am not using any filtering functionality, unless something has been configured under the hood that I am missing?