CPU climbs up if haproxy runs for more than a few minutes

Due to a bit of historical accident (related to the absense of the resolvers server option on older versions of haproxy), we currently reload config, restarting the haproxy worker process, approximately every minute. I’m looking at making better use of resolvers so we only reload when configuration actually changes, however I’ve run into a bit of a roadblock.

If the haproxy runs for more than a few minutes without being restarted, the CPU usage steadily climbs until haproxy is using 100% of available CPU. Unfortunately, I haven’t been able to reproduce this in non-production environments, and for obvious reasons I want to avoid this situation in production.

I have encountered an issue like this in the past, and looking at perf results, most of the cpu time was spent on operations on an LRU cache. I think it is the pattern LRU cache.

We do have regex acls that match urls that have a random id in them, and I’m wondering if that is causing the cache to fill up and we are spending a lot of CPU adding and removing entries from the cache.

Any suggestions on how to reduce CPU usage if the process runs a long time? Having to restart the process every minute seems like a really hacky solution.

And is this something I should make a bug report for?

Could you provide the “haproxy -vv” output to know what is your release version?

HA-Proxy version 2.0.25-6986403 2021/09/07 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wno-implicit-fallthrough -Wno-stringop-overflow -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2_JIT=1 USE_LIBCRYPT=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT -PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -CLOSEFROM -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=4).
Built with OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with the Prometheus exporter as a service

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services :
	prometheus-exporter

Available filters :
	[SPOE] spoe
	[COMP] compression
	[CACHE] cache
	[TRACE] trace

I had another thought on this. Assuming my hypothis that this is related to the LRU cache is correct (I haven’t had a chance to fully verify that yet), I think what happens is the cache fills up, and as new requests come in with cache misses, entries in the LRU cache are rotated out, and those rotations require the binary search tree for the cache to need to be re-balanced quite frequently. Which makes me wonder if using a hash-table (implemented using an array instead of a tree) would be more efficient for the LRU cache, since it doesn’t require rebalancing. And since the LRU cache has a fixed size, the backing array wouldn’t need to be resized either. I’m not quite familiar enough with the code to know how feasible such a switch would be, or if there are major downsides. I also don’t have any hard data to support that such fix would actually solve the problem.