HAProxy 1.5.19 - Segfault on Stick Table - Advice/Troubleshooting Request

Hello,

We’re running the latest 1.5.19 version of HAProxy and have been seeing segfaults since the introduction of a stick table needed to rate limit requests to our ‘login’ endpoint. We have attackers hitting this at around 1200 req/min. Regardless if there’s a current attack, haproxy will crash several times a day, and logs this-
kernel: haproxy[11890]: segfault at 9695000 ip 00007fc839165701 sp 00007ffc748e6a68 error 4 in libc-2.17.so[7fc839016000+1b7000]

Below is the stick table culprit. We know this, because if we comment these lines out, no more crashes. Seems like a pretty simple table, not a lot of room for misconfiguration… This is located in the backend section of our config.

stick-table type ip size 200k expire 5m store gpc0,conn_cur,conn_cnt
acl block_on_path path_beg -i /path/to/login
tcp-request content track-sc1 src if block_on_path
http-request tarpit if { src_conn_cnt ge 15 } block_on_path

If I look at this table at any random time, it seems to average around 4k of the 200k table size, so I think we’re probably not filling up our table. Memory/CPU are not a problem. Under an attack, the CPU load can go up to around 80%, no concerns there yet. Here’s the table as I write this -

Every 1.0s: echo “show table jboss7_cluster” | sudo socat unix:/var/run/haproxy.stats - Fri Mar 10 19:19:13 2017

table: jboss7_cluster, type: ip, size:204800, used:3127
0x9ef249c: key=1.10.199.35 use=0 exp=2045 gpc0=0 conn_cnt=1 conn_cur=0
0x3c0b89c: key=1.10.199.93 use=0 exp=271280 gpc0=0 conn_cnt=1 conn_cur=0
0x9c9c25c: key=1.32.24.54 use=0 exp=75902 gpc0=0 conn_cnt=1 conn_cur=0
0x460fcbc: key=1.33.107.79 use=0 exp=19818 gpc0=0 conn_cnt=1 conn_cur=0
0x402743c: key=1.52.1.57 use=0 exp=251825 gpc0=0 conn_cnt=1 conn_cur=0
0x2b475cc: key=1.52.240.156 use=0 exp=273221 gpc0=0 conn_cnt=1 conn_cur=0
0xab204bc: key=1.53.181.46 use=0 exp=176877 gpc0=0 conn_cnt=1 conn_cur=0
0x39dbd4c: key=1.54.218.41 use=0 exp=132587 gpc0=0 conn_cnt=1 conn_cur=0
0x2c4a1cc: key=1.55.119.221 use=0 exp=278489 gpc0=0 conn_cnt=1 conn_cur=0
0x75531dc: key=1.224.148.65 use=0 exp=24270 gpc0=0 conn_cnt=1 conn_cur=0
0x6ffb12c: key=1.226.133.132 use=0 exp=164417 gpc0=0 conn_cnt=1 conn_cur=0
0x753220c: key=1.228.23.251 use=0 exp=8600 gpc0=0 conn_cnt=1 conn_cur=0
0x8faf35c: key=1.231.28.2 use=0 exp=293120 gpc0=0 conn_cnt=1 conn_cur=0
0x97e6aac: key=1.232.77.49 use=0 exp=277764 gpc0=0 conn_cnt=1 conn_cur=0
0x3ad418c: key=1.234.141.109 use=0 exp=161023 gpc0=0 conn_cnt=1 conn_cur=0
0x5cb6a8c: key=1.234.144.6 use=0 exp=161278 gpc0=0 conn_cnt=1 conn_cur=0
0x6f0615c: key=1.239.51.214 use=0 exp=279495 gpc0=0 conn_cnt=1 conn_cur=0
0x581343c: key=1.241.19.72 use=0 exp=12058 gpc0=0 conn_cnt=1 conn_cur=0
0x3f9da1c: key=1.252.206.152 use=0 exp=128317 gpc0=0 conn_cnt=1 conn_cur=0
0x351aa3c: key=2.30.243.70 use=0 exp=275506 gpc0=0 conn_cnt=1 conn_cur=0
0xaf3e09c: key=2.32.215.93 use=0 exp=45313 gpc0=0 conn_cnt=1 conn_cur=0
0x9968b1c: key=2.50.152.32 use=0 exp=282299 gpc0=0 conn_cnt=1 conn_cur=0
0x329a0bc: key=2.50.213.122 use=0 exp=183950 gpc0=0 conn_cnt=1 conn_cur=0
0x672051c: key=2.86.52.175 use=0 exp=209063 gpc0=0 conn_cnt=1 conn_cur=0
0x9a6cb5c: key=2.88.139.155 use=0 exp=26832 gpc0=0 conn_cnt=1 conn_cur=0
0x43f2c9c: key=2.91.241.189 use=0 exp=278143 gpc0=0 conn_cnt=1 conn_cur=0
0x29f456c: key=2.95.242.245 use=0 exp=145829 gpc0=0 conn_cnt=1 conn_cur=0
0x900cbfc: key=2.98.44.25 use=0 exp=156035 gpc0=0 conn_cnt=1 conn_cur=0
0x7451ccc: key=2.186.216.19 use=0 exp=186612 gpc0=0 conn_cnt=1 conn_cur=0
0x821808c: key=2.230.145.122 use=0 exp=171511 gpc0=0 conn_cnt=1 conn_cur=0
0x82070cc: key=5.12.16.130 use=0 exp=4488 gpc0=0 conn_cnt=1 conn_cur=0
0x540e29c: key=5.15.211.19 use=0 exp=63660 gpc0=0 conn_cnt=1 conn_cur=0

I’d be happy to post any additional log info requested. Thanks in advance!

HAP

So upgrading from 1.5.18 to 1.5.19 did not fix the issue. I kind’of expected that, because your description did not match any of the bugs fixed in 1.5.19.

Is this a package from the repository or did you compile it on your own?

If its from the repository:

If you compiled it:

  • don’t strip the binary from their symbols (meaning don’t manually strip the binary after compiling)

Then, before starting haproxy, run the command “ulimit -c unlimited”, then start haproxy. This will make sure that a core is generated when haproxy crashes the next time.

When haproxy then crashes, you have a corefile that you need to run with gdb against the binary. Install gdb and use it like this:
gdb /path/to/haproxy/binary /path/to/haproxy/corefile
bt full
quit

This will give you the stacktrace of the crash, which should be posted along with this description to the mailing list:
haproxy@formilux.org

Thanks

Hello and thanks for the timely reply!

This never made it into our centos 7 repo yet, hopefully soon as we do not like getting out of sync. So it’s a custom compiled 1.5.19 from source -
make TARGET=linux2628 USE_LIBCRYPT=1 USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_ZLIB=1 USE_PCRE=1

If by stripping the binary from their symbols, you mean, take the resulting binary and it’s cooresponding systemd wrapper and using them, rather than, say, running make install, than I’ll have to revisit that. I basically did a -
cd /tmp
mv /usr/sbin/haproxy /usr/sbin/haproxy-1-5-18
mv /usr/sbin/haproxy-systemd-wrapper /usr/sbin/haproxy-systemd-wrapper-1.5.18
mv haproxy-1519-binary /usr/sbin/haproxy
mv haproxy-systemd-wrapper-1519 /usr/sbin/haproxy-systemd-wrapper
systemctl stop haproxy
systemctl start haproxy

In the meantime, I can’t do much more w/o first getting this traffic off on some other haproxy, so as not to affect all the other services while I mess around with this guy. So, this will take another week or so before I can really start this process, but I’m on it, and appreciate the help!

By stripping I mean you are using the command “strip” on the command-line. If you don’t, then all is great.