HAProxy 1.8.4 at 100% CPU right after startup


#1

Hi,

I’m using HAProxy 1.8.4 on FreeBSD 11.1. Unfortunately it’s unusable, because it consumes 100% CPU a few seconds after startup. I was not able to track this down to a specific option in the configuration file.

Technically speaking is HAProxy still up and running, but in this state the connections behave very strange. For example MySQL connections may or may not work, or they abort or get stuck. Besides that I’m using this HAProxy box for mail and web traffic too. FWIW, this configuration works flawlessly with 1.7.10 (which I was forced to downgrade to).

I’ve tried nbproc 1, nbproc 8, nbthread 1, nbthread 10, no matter what I choose, it always ends up with 100% CPU usage.

The gdb output does not look very useful, or can you spot something useful?

root@host:~ # gdb --pid 58391
GNU gdb (GDB) 8.0.1 [GDB v8.0.1 for FreeBSD]
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 58391
Reading symbols from /usr/local/sbin/haproxy...done.
Reading symbols from /lib/libcrypt.so.5...(no debugging symbols found)...done.
Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols found)...done.
Reading symbols from /lib/libcrypto.so.8...(no debugging symbols found)...done.
Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging symbols found)...done.
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
[Switching to LWP 100644 of process 58391]
0x000000867b1b8fea in _kevent () from /lib/libc.so.7
(gdb) bt full
#0  0x000000867b1b8fea in _kevent () from /lib/libc.so.7
No symbol table info available.
#1  0x000000867a25fcd2 in ?? () from /lib/libthr.so.3
No symbol table info available.
#2  0x000000000041d85c in _do_poll (p=0x8ab7d0 <cur_poller>, exp=45596037) at src/ev_kqueue.c:115
        status = 2074008608
        count = 45596037
        fd = 200
        delta_ms = 5
        timeout = {tv_sec = 0, tv_nsec = 5000000}
        updt_idx = 0
        en = 0
        eo = 0
        changes = 0
#3  0x000000000050f7be in run_poll_loop () at src/haproxy.c:2424
        next = 45596037
        exp = 45596037
#4  0x000000000050cf10 in run_thread_poll_loop (data=0x867bbfe6c0) at src/haproxy.c:2460
        start_lock = 0
        ptif = 0x89e2f8 <per_thread_init_list>
        ptdf = 0x206
#5  0x0000000000509a90 in main (argc=6, argv=0x793eabda1b38) at src/haproxy.c:3049
        tids = 0x867bbfe6c0
        threads = 0x867bbfe6e0
        i = 1
        err = 0
        retry = 200
        limit = {rlim_cur = 500096, rlim_max = 500096}
        errmsg = "\000\033\332\253>y\000\000p\033\332\253>y\000\000\070\033\332\253>y\000\000\006\000\000\000\000\000\000\000\217c}\310;\"sA\240O\212\000\000\000\000\000\060\033\332\253>y\000\000p\033\332\253>y\000\000\070\033\332\253>y\000\000\006\000\000\000\000\000\000\000\320\032\332\253>y\000\000\322\005 {\206\000\000\000\001\000\000"
        pidfd = 40
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x000000867b1b8fea in _kevent () from /lib/libc.so.7
(gdb) bt full
#0  0x000000867b1b8fea in _kevent () from /lib/libc.so.7
No symbol table info available.
#1  0x000000867a25fcd2 in ?? () from /lib/libthr.so.3
No symbol table info available.
#2  0x000000000041d85c in _do_poll (p=0x8ab7d0 <cur_poller>, exp=45597649) at src/ev_kqueue.c:115
        status = 2074006208
        count = 45597649
        fd = 200
        delta_ms = 134
        timeout = {tv_sec = 0, tv_nsec = 134000000}
        updt_idx = 0
        en = 0
        eo = 0
        changes = 0
#3  0x000000000050f7be in run_poll_loop () at src/haproxy.c:2424
        next = 45597649
        exp = 45597649
#4  0x000000000050cf10 in run_thread_poll_loop (data=0x867bbfe6c0) at src/haproxy.c:2460
        start_lock = 0
        ptif = 0x89e2f8 <per_thread_init_list>
        ptdf = 0x206
#5  0x0000000000509a90 in main (argc=6, argv=0x793eabda1b38) at src/haproxy.c:3049
        tids = 0x867bbfe6c0
        threads = 0x867bbfe6e0
        i = 1
        err = 0
        retry = 200
        limit = {rlim_cur = 500096, rlim_max = 500096}
        errmsg = "\000\033\332\253>y\000\000p\033\332\253>y\000\000\070\033\332\253>y\000\000\006\000\000\000\000\000\000\000\217c}\310;\"sA\240O\212\000\000\000\000\000\060\033\332\253>y\000\000p\033\332\253>y\000\000\070\033\332\253>y\000\000\006\000\000\000\000\000\000\000\320\032\332\253>y\000\000\322\005 {\206\000\000\000\001\000\000"
        pidfd = 40
(gdb) cont
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x000000867b1b8fea in _kevent () from /lib/libc.so.7
(gdb) bt full
#0  0x000000867b1b8fea in _kevent () from /lib/libc.so.7
No symbol table info available.
#1  0x000000867a25fcd2 in ?? () from /lib/libthr.so.3
No symbol table info available.
#2  0x000000000041d85c in _do_poll (p=0x8ab7d0 <cur_poller>, exp=45605353) at src/ev_kqueue.c:115
        status = 2074007168
        count = 45605353
        fd = 200
        delta_ms = 32
        timeout = {tv_sec = 0, tv_nsec = 32000000}
        updt_idx = 0
        en = 0
        eo = 0
        changes = 0
#3  0x000000000050f7be in run_poll_loop () at src/haproxy.c:2424
        next = 45605353
        exp = 45605353
#4  0x000000000050cf10 in run_thread_poll_loop (data=0x867bbfe6c0) at src/haproxy.c:2460
        start_lock = 0
        ptif = 0x89e2f8 <per_thread_init_list>
        ptdf = 0x206
#5  0x0000000000509a90 in main (argc=6, argv=0x793eabda1b38) at src/haproxy.c:3049
        tids = 0x867bbfe6c0
        threads = 0x867bbfe6e0
        i = 1
        err = 0
        retry = 200
        limit = {rlim_cur = 500096, rlim_max = 500096}
        errmsg = "\000\033\332\253>y\000\000p\033\332\253>y\000\000\070\033\332\253>y\000\000\006\000\000\000\000\000\000\000\217c}\310;\"sA\240O\212\000\000\000\000\000\060\033\332\253>y\000\000p\033\332\253>y\000\000\070\033\332\253>y\000\000\006\000\000\000\000\000\000\000\320\032\332\253>y\000\000\322\005 {\206\000\000\000\001\000\000"
        pidfd = 40


root@host:~ # haproxy -vv
HA-Proxy version 1.8.4-1deb90d 2018/02/08
Copyright 2000-2018 Willy Tarreau <willy@haproxy.org>

Build options :
  TARGET  = freebsd
  CPU     = generic
  CC      = cc
  CFLAGS  = -pipe -g -fstack-protector -fno-strict-aliasing -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1

Default settings :
  maxconn = 200000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
     kqueue : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
	[TRACE] trace
	[COMP] compression
	[SPOE] spoe

This sounds similar to this report:


But the patch did not solve it for me. Maybe it’s totally unrelated to SSL in general.

Thanks!

  • Frank

#2

Ok, try one at the time:

  • put nokqueue into the global section
  • compile without threads (that’s different than just nbthreads 1), use the USE_THREAD= compile option
  • install truss and attached it to running haproxy process (truss -dfp <PID>)

#3

i have this bug too(FreeBSD).

1.8.4(haproxy-ss-20180227.tar.gz ):
HA-Proxy version 1.8.4-de425f6 2018/02/26
Copyright 2000-2018 Willy Tarreau willy@haproxy.org

Build options :
TARGET = freebsd
CPU = generic
CC = cc
CFLAGS = -O2 -pipe -fstack-protector -fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
kqueue : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
[TRACE] trace
[COMP] compression
[SPOE] spoe

lua:

txn:set_var > 10000
txn:get_var > 10000

A few hours after running CPU 100%

core.register_action(“hack_get_res”, { “http-res” }, function(applet)

applet.sf:path()

Some time return nil

backend:

Some time have more 504 timeout gateway
A few hours after running

1.7.10(haproxy-ss-20180227.tar.gz)

HA-Proxy version 1.7.10-6917b73 2018/02/26
Copyright 2000-2018 Willy Tarreau willy@haproxy.org

Build options :
TARGET = freebsd
CPU = generic
CC = cc
CFLAGS = -O2 -pipe -fstack-protector -fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -DFREEBSD_PORTS
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY

Available polling systems :
kqueue : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use kqueue.

Available filters :
[SPOE] spoe
[TRACE] trace
[COMP] compression

lua:

txn:set_var > 10000
txn:get_var > 10000
A few hours after running CPU 100%

core.register_action(“hack_get_res”, { “http-res” }, function(applet)

applet.sf:path()

Some time return nil


#4

Can you guys apply the following patch and see if it helps?

http://git.haproxy.org/?p=haproxy-1.8.git;a=patch;h=5149cd3c7abad68ddb19a0a5b3b604786d5f1b95


#5

Thanks, Lukas!

I was finally able to reproduce the issue on another box by using the minimal configuration from the commit message. HAProxy reaches 100% CPU load immediately when using this config.

And I’ve tested the patch and it fixes the issue, at least for the minimal configuration. Great!

I have yet to test if the issue is solved on my production box. But I have to schedule a downtime for this, not an easy task, can’t do this immediately. :frowning:


#6

@fraenki, the patch pointed by @lukastribus is probably unrelated to your problem. Because it is a problem only when you try to start more than 32 threads. Have you any chance to confirm if it is related to SSL or not ? Could you also share a minimal configuration to reproduce the bug (or your current configuration if this is not possible) ?
I also need to know if this happens only when there is some load on your HAProxy or if it could be triggered with couple of requests ?

Of course, information asked by @lukastribus (nokqueue, truss info…) could also help. And It would also help a lot to provide info from the CLI:

  • show sess all
  • show fd

Be careful, if you have some load on your server, the output for these commands can be huge. And don’t forget to sanitize it to remove all sensitive information.


#7

@fsfs, I don’t know if your problem is the same or not. Because it seems to be related to LUA. And I don’t understand if it works with HAProxy 1.7.10 or not. To be honest, the formatting of your message does not help :slight_smile:

But, your configuration could help to understand the bug. And, also check my previous reply.


#8

I’m having this same exact problem. Thankfully my metrics caught it before we went live. I had to downgrade to 1.7.4. What sections of my configuration should I post to help diagnose this issue?


#9

Maybe you should upgrade to latest 1.8 version: 1.8.9 or use the latest version for 1.8-git.