2.0.1 cpu Usage at near 100% after upgrade from 1.5

rbrooker · July 18, 2019, 6:36pm

I’m on a 2 core machine with 4 gigs of memory
I have 11 different configs each on its own systemd process to isolate services.
cpu never went above 30% on 1.5 (default available in the CentOS7 repo)
I built 2.0.1 rpm updated the systemd files and no changes to the configs, now on start the cpu spikes and stays there.

Should I be configuring things differently for 2.0.1? or is this just a bug and I need to install another version/patch

uname -a
Linux proxy0 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 29 14:49:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

HA-Proxy version 2.0.1 2019/06/26 - https://haproxy.org/
Build options :
TARGET = linux-glibc
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
OPTIONS = USE_PCRE=1 USE_PCRE_JIT=1 USE_THREAD=1 USE_REGPARM=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_ZLIB=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE +PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=2).
Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as cannot be specified using ‘proto’ keyword)
h2 : mode=HTX side=FE|BE mux=H2
h2 : mode=HTTP side=FE mux=H2
: mode=HTX side=FE|BE mux=H1
: mode=TCP|HTTP side=FE|BE mux=PASS

Available services : none

Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace

lukastribus · July 18, 2019, 7:54pm

Please upgrade to 2.0.2, it fixes a ton of issues, including CPU related stuff.

rbrooker · July 18, 2019, 8:49pm

I updated… no change in the cpu … although when I stop one of the services it drops to less then 10% then climbs again back to 99-100%

here is the config perhaps I need to alter it?

global
        daemon
        user haproxy
        group haproxy

defaults
        mode http
        maxconn 10000
        timeout connect 5000
        timeout client 50000
        timeout server 50000

listen stats
        bind 10.0.0.4:1978
        stats enable
        stats realm Haproxy\ Statistics\ RabbitMQ
        stats uri /
        stats refresh 5s

# RabbitMQ
listen rabbit
        bind 10.0.0.4:5672 v4v6
        balance roundrobin
        mode tcp
        option tcp-check

        server rabbit-1  10.0.0.1:5672    check inter 2000 rise 2 fall 3 send-proxy
        server rabbit-2  10.0.0.2:5672    check inter 2000 rise 2 fall 3 send-proxy
        server rabbit-3  10.0.0.3:5672    check inter 2000 rise 2 fall 3 send-proxy

lukastribus · July 18, 2019, 10:09pm

This is most likely a bug, there is also a similar report on the mailing list:

https://www.mail-archive.com/haproxy@formilux.org/msg34558.html

Could you attach strace -tt -p<PID> to a process occupying 100% and provide a few seconds of it’s output (it will be large)? Are you able to reproduce this with nbthread 1 in your configuration?

CC’ing @willy

rbrooker · July 18, 2019, 11:38pm

Thank you, I did the strace without the nbthread change. that did drop the cpu usage a lot though.
Either way though I am not able to pull up the stats page.

gist.github.com

https://gist.github.com/cognition/29a60fdab8cdb65f8b7307610e8e3b5d

txt

19:33:54.590028 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {488, 37709594}) = 0
19:33:54.590067 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {488, 37715351}) = 0
19:33:54.590105 epoll_wait(5, [{EPOLLIN, {u32=6, u64=6}}], 200, 0) = 1
19:33:54.590143 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {488, 37726124}) = 0
19:33:54.590182 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {488, 37731871}) = 0
19:33:54.590220 epoll_wait(5, [{EPOLLIN, {u32=6, u64=6}}], 200, 0) = 1
19:33:54.590259 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {488, 37742620}) = 0
19:33:54.590298 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {488, 37748558}) = 0
19:33:54.590336 epoll_wait(5, [{EPOLLIN, {u32=6, u64=6}}], 200, 0) = 1
19:33:54.590519 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {488, 37761201}) = 0

This file has been truncated. show original

willy · July 19, 2019, 3:11am

Excellent, thank you. So it shows that some data are announced as available, but not read. It could be a problem of a buffer full condition that is not properly handled. What is surprising is that in TCP mode the path between the fd and the upper stream is the shortest possible (we don’t even use muxes) so the reason for this must be a bit gross. Now we need to find a way to reproduce this.
It would be interesting to know if this also happens without checks so that we can tell whether it’s checks or regular traffic which is causing this.

rbrooker · July 19, 2019, 2:23pm

Thank you for looking into this.

I just revered to 1.8 and its working. which made me realize I was only getting 504 connections now its 1528… it was blocking all those connections

ngaugler · July 25, 2019, 6:00pm

Hello,

I am having a similar problem, as documented on the mailing list here: https://www.mail-archive.com/haproxy@formilux.org/msg34605.html

I have tried removing all checks and agents and still experience this problem with 2.0.3. Here is an example server line:

server s1 10.0.2.1:8080 weight 100 source 10.0.1.10

I will try and simplify things even further (by removing all over listen/frontend but this one entry) and report back. Please let me know if there is anything else you would prefer I try.

ngaugler · July 25, 2019, 6:16pm

Ok, so I simplified everything. Removed all other services, just one simple HTTP load balancer between the front end application and the back end application. I removed threading from the picture and kept it to only a single process. I tried to make it as basic as possible. 2.0.3 still suffers from maxing out the cpu and dropping requests, when haproxy 1.6 does not.

Here are the configs:

global
        log /dev/log    local0 notice
        chroot /var/lib/haproxy

        stats socket /run/haproxy/haproxy_20.sock mode 664 level admin
        stats timeout 30s

        user haproxy
        group haproxy
        daemon

        nbproc 1
        nbthread 1

        maxconn 500000


defaults
        log     global
        mode    http

        option  dontlognull
        option  dontlog-normal
        option  redispatch

        option  tcp-smart-accept
        option  tcp-smart-connect

        timeout connect 2s
        timeout client  50s
        timeout server  50s
        timeout client-fin 1s
        timeout server-fin 1s

        maxconn 150000

        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

listen back
        bind    10.0.0.249:8080    defer-accept
        bind    10.0.0.251:8080    defer-accept
        bind    10.0.0.252:8080    defer-accept
        bind    10.0.0.253:8080    defer-accept
        bind    10.0.0.254:8080    defer-accept
        mode    http

        maxconn 65000
        fullconn 65000

        balance leastconn
        http-reuse safe

        server  s1     10.0.6.11:8080  weight 100 source 10.0.1.100
        server  s2     10.0.6.12:8080  weight 100 source 10.0.1.101
        server  s3     10.0.6.13:8080  weight 100 source 10.0.1.102
        server  s4     10.0.6.14:8080  weight 100 source 10.0.1.103
        server  s5     10.0.6.15:8080  weight 100 source 10.0.1.100
        server  s6     10.0.6.16:8080  weight 100 source 10.0.1.101
        server  s7     10.0.6.17:8080  weight 100 source 10.0.1.102
        server  s8     10.0.6.18:8080  weight 100 source 10.0.1.103
        server  s9     10.0.6.19:8080  weight 100 source 10.0.1.100
        server  s10    10.0.6.20:8080  weight 100 source 10.0.1.101
        server  s11    10.0.6.21:8080  weight 100 source 10.0.1.102
        server  s12    10.0.6.22:8080  weight 100 source 10.0.1.103
        server  s13    10.0.6.23:8080  weight 100 source 10.0.1.100
        server  s14    10.0.6.24:8080  weight 100 source 10.0.1.101

willy · July 31, 2019, 2:25pm

Given the very low fd number I strongly suspect it’s a listener that is looping like this. Now why is it looping like this ? I still have no idea. I’d be fine with it reaching a limit or something but it should disable polling, which is not done here. I’ll have another look at the accept() code to see if anything could cause one FD not to be properly disabled once a limit is reached.

Thanks!

ngaugler · July 31, 2019, 8:44pm

If there is a setting I need to increase, I can easily do so. This device only serves as a load balancer, so we can allocate whatever resources are necessary to the processes.

Were there any dramatic code changes between 1.6 -> 2.0 in the area you have a concern?

willy · August 1, 2019, 7:57am

There were many changes between 1.6 and 2.0 in these areas. Threads, layered connections with muxes, accept-queues, idle connections etc are all possible candidates to justify a change of behaviour. But whatever the reason if an FD is waking up your process all the time without being handled, it is a bug that needs to be addressed. At the very least it should be disabled for the time needed for the issue to go. That’s what we need to figure.

rbjorklin · August 1, 2019, 11:13pm

I’m also seeing this/a similar issue with 2.0.3.

@rbrooker and @ngaugler are you able to reproduce this issue if you set “no option http-use-htx” in the defaults? It was changed to be the new default in “2.0-dev3”.
https://cbonte.github.io/haproxy-dconv/2.0/configuration.html#4.2-option%20http-use-htx

ngaugler · August 1, 2019, 11:46pm

I quickly tested with and without ‘no option http-use-htx’ and saw 100% utilization with both. Reverting to 1.6 immediately fixed the issue. For the simplified version I am using no threading, although code changes may have been necessary to support threading this impacts both threading and no threading.

If there is anything else you need me to try please let me know. It’s very easy to reproduce. I am really quite surprised that everyone else doesn’t have this problem… other than volume of traffic I am not sure what I am doing differently.

rbjorklin · August 2, 2019, 1:17am

We were using 1.9.8 without issues before upgrading. Have you seen the same problem with any version in the 1.9.X branch?

willy · August 7, 2019, 7:49am

When the problem happens it would be nice to see the socket states:
$ ss -atn |cut -f1 -d’ ’ |sort|uniq -c
If you see some CLOSE_WAIT, please run ss -atn|grep CLOSE_WAIT and check whether they are from the client to haproxy or from haproxy to the server.

willy · August 7, 2019, 9:56am

Also how to you start your haproxy process ? Are you using the master-worker system ? Does it immediately fail upon first startup or does it fail after some time, after processing some traffic, after a reload ? It would be interesting to know what FD the fd==5 socket corresponds to, this can be done using “ss -anp|grep -w ‘fd=5’” (assuming the fd is still 5).

I’m a bit bothered by this one because the only way not to accept a connection in the listeners code is to reach a configured limit, and with your config it will not happen for a while, so it must be something different.

scubadrew · December 5, 2019, 4:58pm

Have there been any updates on this? I’m seeing this is 2.1 now.

willy · December 11, 2019, 9:43am

Yep, not only there were updates, but we fixed this bug in the listeners a few days ago. I’m going to issue 2.0.11 and 2.1.1 very soon with all these fixes.

joel-l · December 16, 2019, 3:19pm

I’m seeing the same thing (I think) using 2.1.1.

At random times, Haproxy goes to 100% CPU usage and stays there. It doesn’t seem that requests are actually failing during that time.

I’ve seen this behaviour with many different haproxy versions, starting from (I think 1.8) (that is — upgrading to the latest version hasn’t solved it)

haproxy -vv

HA-Proxy version 2.1.1-1ppa1~bionic 2019/12/14 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2021.
Known bugs: http://www.haproxy.org/bugs/bugs-2.1.1.html
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-GwOBOb/haproxy-2.1.1=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wno-implicit-fallthrough -Wno-stringop-overflow -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=2).
Built with OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
Running on OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.31 2018-02-12
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with the Prometheus exporter as a service

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTTP       side=FE|BE     mux=H2
            fcgi : mode=HTTP       side=BE        mux=FCGI
       <default> : mode=HTTP       side=FE|BE     mux=H1
       <default> : mode=TCP        side=FE|BE     mux=PASS

Available services :
	prometheus-exporter

Available filters :
	[SPOE] spoe
	[CACHE] cache
	[FCGI] fcgi-app
	[TRACE] trace
	[COMP] compression

strace has lots of epoll_wait & clock_gettime entries:

strace -tt

14:54:09.596398 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 10) = 7
14:54:09.596460 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539479278}) = 0
14:54:09.596515 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539493232}) = 0
14:54:09.596566 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 10) = 7
14:54:09.596629 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539514516}) = 0
14:54:09.596683 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539527943}) = 0
14:54:09.596734 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 9) = 7
14:54:09.596796 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539549215}) = 0
14:54:09.596851 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539562981}) = 0
14:54:09.596907 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 9) = 7
14:54:09.596983 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539584806}) = 0
14:54:09.597039 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539598420}) = 0
14:54:09.597090 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 9) = 7
14:54:09.597152 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539619711}) = 0
14:54:09.597206 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539633047}) = 0
14:54:09.597257 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 9) = 7
14:54:09.597320 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539654265}) = 0
14:54:09.597374 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539667936}) = 0
14:54:09.597425 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 9) = 7
14:54:09.597487 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539689171}) = 0
14:54:09.597542 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539702935}) = 0
14:54:09.597593 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 9) = 7
14:54:09.597655 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539724153}) = 0
14:54:09.597709 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539737633}) = 0
14:54:09.597760 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 8) = 7
14:54:09.597822 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539758879}) = 0
14:54:09.597876 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539772377}) = 0
14:54:09.597931 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 8) = 7
14:54:09.597994 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539793548}) = 0
14:54:09.598049 clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=6385, tv_nsec=539807298}) = 0
14:54:09.598100 epoll_wait(7, [{EPOLLOUT|EPOLLERR|EPOLLHUP, {u32=245, u64=245}}, {EPOLLOUT, {u32=79, u64=79}}, {EPOLLOUT, {u32=319, u64=319}}, {EPOLLOUT, {u32=351, u64=351}}, {EPOLLOUT, {u32=352, u64=352}}, {EPOLLOUT, {u32=279, u64=279}}, {EPOLLOUT, {u32=280, u64=280}}], 200, 8) = 7

strace -c

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 64.82    2.622832           9    292307           clock_gettime
 35.18    1.423343          10    146147           epoll_wait
  0.00    0.000146          12        12           timer_settime
  0.00    0.000134          11        12           rt_sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00    4.046455                438478           total

haproxy.cfg

global
        log ${LOCAL_SYSLOG}:514 local0
        # log /dev/log    local0
        # log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

        log-send-hostname

        tune.ssl.default-dh-param 2048

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
        ssl-default-bind-options no-sslv3 no-tlsv10

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        timeout http-request 20s
        timeout http-keep-alive 4s
        timeout connect 10s
        timeout client  20s
        timeout server  10s
	timeout queue	6s
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

	log-format %ci:%cp\ [%t]\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ST\ %B\ %CC\ %CS\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\ %{+Q}r\ REQUESTID=%ID


frontend http-in
        mode http
        bind :80

	http-response set-header Server CustomName

	unique-id-format %{+X}o\ %ci:%cp_%fi:%fp_%Ts_%rt:%pid
	unique-id-header X-Request-ID

        acl is_acme_challenge path_beg /.well-known/acme-challenge/

	acl is_valid_http_method method GET HEAD OPTIONS
        http-request deny if ! is_valid_http_method

	redirect scheme https code 301 if ! is_acme_challenge

        use_backend letsencrypt-backend if is_acme_challenge
        use_backend nothing


frontend http-in-secure
        mode http
        bind :443 name https ssl strict-sni crt /etc/haproxy/certs/domain.pem allow-0rtt alpn h2,http/1.1

	maxconn 5000

        capture request header Host len 60
        capture request header Content-Length len 10
        capture request header Referer len 100

	http-response set-header Server CustomName

	capture cookie __acookie= len 49

        unique-id-format %{+X}o\ %ci:%cp_%fi:%fp_%Ts_%rt:%pid
        unique-id-header X-Request-ID

        tcp-request inspect-delay 5s

        stick-table type ip size 500k expire 60s store http_req_rate(60s),conn_rate(60s)

        tcp-request connection reject if { src_http_req_rate gt 1200 }
	tcp-request connection reject if { src_conn_rate gt 1200 }

        http-request deny if { src_http_req_rate gt 1200 }
        tcp-request connection track-sc1 src

        acl is_too_long_url url_len gt 2000
        http-request deny if is_too_long_url

	acl is_valid_http_method method GET HEAD OPTIONS
	http-request deny if ! is_valid_http_method

        use_backend nothing if ! { hdr(host) hostname-1.example.com } ! { hdr(host) hostname-2.example.com }

        use_backend varnish


frontend internal-varnish-be
        mode http
        bind :8080
	option dontlog-normal
	no log
        use_backend mybackend-h-5


backend varnish
        server mybackend-varnish 127.0.0.1:8090 maxconn 2000


backend mybackend-h-5
        http-request set-header Host hostname-x.example.com

        server mybackend-h-5 11.11.11.11:443 ssl sni str(hostname-x.example.com) maxconn 70 verify required ca-file DST_Root_CA_X3.pem


backend nothing
        errorfile 400 /etc/haproxy/errors/400-empty.http
        http-request deny deny_status 400


backend letsencrypt-backend
        server letsencrypt 127.0.0.1:8091 maxconn 5

Topic		Replies	Views
100% CPU usage with HAproxy 2.0.31 Help!	2	1120	April 21, 2023
High CPU usage after upgrading Haproxy Help!	6	3342	March 29, 2017
HAProxy 1.5.18 Centos 6 CPU Issue Help!	39	2364	January 23, 2018
HAProxy stops serving frontend requests while not closing backend connections at 100% cpu utilisation Help!	10	6957	August 7, 2019
Haproxy slowing down after several days of uptime Help!	10	1835	March 23, 2021

2.0.1 cpu Usage at near 100% after upgrade from 1.5

Related topics