HAProxy v1.9.2 crash during reload (or shortly thereafter)

Hello friends! I think I might have found a bug in HAProxy. Honestly, I’ve never thought it would happen because it’s usually rock solid, but I’m not sure what else would cause it.

We run HAProxy in master-worker mode along with also using seamless reloads inside of Docker, which is managed by Kubernetes. The docker tag we were using was haproxy:1.9.2 when this happened. We have since reverted back to haproxy:1.8.14 because that was what was working prior to this error.

The behavior we experienced was during a seamless reload, HAProxy completely crashed with an error message of *** Error in haproxy': free(): invalid pointer: 0x000055b420684028 ***

Here is an export of the logs of one of our HAProxy containers: (see gist)

If you are paying attention, you will notice some logs that do not look native to HAProxy. Those are from our script that wraps HAPRoxy and helps to manage reloading HAProxy inside of a Docker container. Here is that script: (see gist)

Out configuration uses quite a lot of the features that HAProxy exposes, I think, but since the crash, the configuration has been running just fine, so I don’t believe the configuration itself to be the issue. Since we have a decent amount of information in there I would likely need to redact, I will avoid pasting it here for now. I can go through and redact it later and paste it if necessary. I guess I’m curious if the logs provide enough information to debug because I have no idea how to read them once the crash happens.

I also tried to reproduce this by continually reloading HAProxy v1.9.2 in a lower traffic environment, but could not reproduce it.

I did check the mailing list but most of the crashes seem to be due to segfaults, not invalid pointers.

Any thoughts on what could cause this and if it has been seen before?

May be i missed it, I could not see the build version in logs. Is it possible to start haproxy with -vv flag & upload the logs to dig further.

-vv : display the version, build options, libraries versions and usable
pollers. This output is systematically requested when filing a bug report.

I apologize. I should have known. Here it is:

root@ip-10-1-27-251:/home/ubuntu# docker exec -it k8s_haproxy_haproxy-router-qfks2_routing-external-default_f6ed0500-24d5-11e9-a3d8-0242ac110003_0 sh
# haproxy -vv
HA-Proxy version 1.9.2 2019/01/16 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.0j  20 Nov 2018
Running on OpenSSL version : OpenSSL 1.1.0j  20 Nov 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.3
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 8.39 2016-06-14
Running on PCRE version : 8.39 2016-06-14
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE
              h2 : mode=HTTP       side=FE
       <default> : mode=HTX        side=FE|BE
       <default> : mode=TCP|HTTP   side=FE|BE

Available filters :
	[SPOE] spoe
	[COMP] compression
	[CACHE] cache
	[TRACE] trace

The one interesting thing to note is that it shows all three polling systems as passing being OK, but in the runtime logs it shows select as FAILED. I’m not sure if that matters at all, but it definitely seems weird. I guess either way, I use epoll since it has the highest preference anyways.

In the runtime logs…

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result FAILED
Total: 3 (2 usable), will use epoll.

Available filters :
	[SPOE] spoe
	[COMP] compression
	[TRACE] trace

Do you have a core file for this crash? If not, please make sure it’s saved next time: how to get a core file in docker.

When you have the crash and a corefile, please provide a stacktrace from it (basically starting gdb pointing to the executable and the core file and issue the command bt full).

Thank you for this information. It will likely take a while to get this all as I cannot reproduce the issue, but it was a major crash in one of our production environments. We have since rolled back to v1.8.14 but will try this all soon.

Thank you again.

If this is a critical environment I would suggest you wait for the 1.9.4 release. It will contain some important fixes (but probably not related to your issue).

Btw, how does docker reload haproxy exactly? Do you know specifics?

Gotchya, good to know. Then I will plan on waiting for that release and during our upgrade make sure we can get core files, stacktraces and such in case it happens again. Thanks!

As for reloading HAProxy within the container, you have a few options. We do it internally via a script that handles signaling the HAPRoxy process: https://gist.github.com/ingshtrom/ae42191e7ca9dd751a1642eb1e2d5787#file-start_haproxy-sh

Alternatively, if running the default haproxy Docker image, you can run docker kill -s HUP my-running-haproxy and Docker will forward the SIGHUP signal to the HAProxy master process.

Unfortunately, seamless reload requires a bit more finesse and more information than just “hey process, reload”, so that is why we use the script to wrap the HAProxy master process.

Does that answer your question?

Ok, thanks.

Do you use unique-id in your configuration?

I have never heard of that. I assume you mean unique-id-header and unique-id-format. I checked our configs and I don’t see us using it anywhere.

Ok, thanks.

Let us know if/when you have new informations.

In the meantime, I filed an issue on our bug tracker, so that informations from other users are more likely to go to a single place and developers are aware of it:

By the way if your config contains user/uid/group/gid, you either need to disable them or to make sure to write 1 to /proc/sys/fs/suid_dumpable. Last, you need to disable any “chroot” statement in the haproxy config, otherwise the system will not dump the core either.

1 Like

FYI 1.9.4 has been released. Please make sure you set everything up to dump a core and also consider what willy mentioned earlier.