Haproxy 1.9.0 segfault at 7f141e6e3ab8 ip 00007f141e6e3ab8 sp 00007ffea3eab4b8 error 15 in libc-2.17.so[7f141e6e3000+2000]

I have a problem with latest haproxy 1.9.0, http traffic is OK, but when running with https traffic, haproxy got segfault errors and crashed. Segfault errors are as below:

[ 6374.791610] haproxy[2741]: segfault at 7f141e6e3ab8 ip 00007f141e6e3ab8 sp 00007ffea3eab4b8 error 15 in libc-2.17.so[7f141e6e3000+2000]
[ 6376.080835] haproxy[2739]: segfault at 60 ip 0000000000000060 sp 00007ffea3eab4b8 error 14 in haproxy-1.9.0[400000+46a000]
[ 6385.632464] haproxy[2762]: segfault at b0 ip 00000000004cc0da sp 00007fff64bd3360 error 4 in haproxy-1.9.0[400000+46a000]
[ 6389.265346] haproxy[2764]: segfault at 0 ip           (null) sp 00007fff64bd3358 error 14 in haproxy-1.9.0[400000+46a000]
[ 6389.546879] traps: haproxy[2766] general protection ip:4cc0da sp:7fff64bd3360 error:0 in haproxy-1.9.0[400000+46a000]
[ 6389.571351] haproxy[2763]: segfault at ffffffffffffffb8 ip ffffffffffffffb8 sp 00007fff64bd3358 error 15
[ 6390.114721] traps: haproxy[2767] general protection ip:4cc0da sp:7fff64bd3360 error:0 in haproxy-1.9.0[400000+46a000]
[ 6391.928882] haproxy[2765]: segfault at ffffffffffffffb8 ip ffffffffffffffb8 sp 00007fff64bd3358 error 15

[ 7565.677404] haproxy[8910]: segfault at 96 ip 00000000004cc0da sp 00007ffcb2fdf250 error 4 in haproxy-1.9.0[400000+46a000]
[ 7566.251417] haproxy[8909]: segfault at ffffffffffffffb8 ip ffffffffffffffb8 sp 00007ffcb2fdf248 error 15
[ 7569.549036] haproxy[8912]: segfault at 0 ip           (null) sp 00007ffcb2fdf248 error 14 in haproxy-1.9.0[400000+46a000]
[ 7570.831296] haproxy[8913]: segfault at 0 ip           (null) sp 00007ffcb2fdf248 error 14 in haproxy-1.9.0[400000+46a000]
[ 7572.139128] traps: haproxy[8911] general protection ip:4cc0da sp:7ffcb2fdf250 error:0 in haproxy-1.9.0[400000+46a000]
[ 7576.601277] traps: haproxy[8908] general protection ip:4cc0da sp:7ffcb2fdf250 error:0 in haproxy-1.9.0[400000+46a000]

haproxy -vv

HA-Proxy version 1.9.0 2018/12/19 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_STATIC_PCRE2=1 USE_PCRE2_JIT=1 USE_TFO=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.1a  20 Nov 2018
Running on OpenSSL version : OpenSSL 1.1.1a  20 Nov 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.32 2018-09-10
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE
              h2 : mode=HTTP       side=FE
       <default> : mode=HTX        side=FE|BE
       <default> : mode=TCP|HTTP   side=FE|BE

Available filters :
	[SPOE] spoe
	[COMP] compression
	[CACHE] cache
	[TRACE] trace

Configs related to SSL

    tune.maxaccept  -1
    tune.bufsize  32768
    tune.maxrewrite  8192
    tune.ssl.cachesize  2000000
    tune.ssl.lifetime  600
    tune.ssl.default-dh-param  1024
    tune.ssl.ssl-ctx-cache-size  4096
    ssl-default-bind-options no-sslv3
    ssl-default-server-options no-sslv3
    bind 0.0.0.0:443 ssl crt /path/to/domain.pem ciphers ECDHE+aRSA+AES256+GCM+SHA384:ECDHE+aRSA+AES128+GCM+SHA256:ECDHE+aRSA+AES256+SHA384:ECDHE+aRSA+AES128+SHA256:ECDHE+aRSA+RC4+SHA:ECDHE+aRSA+AES256+SHA:ECDHE+aRSA+AES128+SHA:AES256+GCM+SHA384:AES128+GCM+SHA256:AES128+SHA256:AES256+SHA256:DHE+aRSA+AES128+SHA:RC4+SHA:HIGH:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!SRP:!DSS
    .........
    http-request del-header Proxy
    http-request set-header X-Forwarded-Proto https
    http-response set-header Strict-Transport-Security max-age=0

I suggest your recompile without compiler optimizations, by adding the following CFLAGS directive to your make line when recompiling:

make clean && \
make TARGET=[... however your compiled previously] \
CFLAGS="-O0 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits"

In the haproxy output you should then see CFLAGS begin with -O0 instead of -O2 (the rest should be the same).

Then, instead of starting haproxy via systemd or initd, start it manually and after setting the core file size to unlimited:

# whoami
root
# ulimit -c unlimited
# haproxy -f /etc/haproxy/haproxy

When haproxy crashes, you will have a corefile in the working directory. You can either run it through gdb yourself:

gdb haproxy core
(gdb)$ bt full

Or send the haproxy executable and the corefile in a tar to me (can provide private upload instructions via private message). Notice that the corefile will contain private data such as SSL keys, IP addresses hostnames and even parts of the transactions.

You can look at gdb trace as below.

(gdb) bt

#0  0x0000000000000000 in ?? ()
#1  0x000000000051be74 in connect_server (s=0x5d20690) at src/backend.c:1252
#2  0x0000000000461949 in sess_update_stream_int (s=0x5d20690) at src/stream.c:928
#3  0x00000000004659c9 in process_stream (t=0x5483e70, context=0x5d20690, state=1025) at src/stream.c:2305
#4  0x00000000005737ab in process_runnable_tasks () at src/task.c:432
#5  0x00000000004ba236 in run_poll_loop () at src/haproxy.c:2619
#6  0x00000000004ba5b5 in run_thread_poll_loop (data=0xd8cf30) at src/haproxy.c:2684
#7  0x00000000004bbdef in main (argc=6, argv=0x7ffdc6650748) at src/haproxy.c:3313

(gdb) bt full

#0  0x0000000000000000 in ?? ()
No symbol table info available.
#1  0x000000000051be74 in connect_server (s=0x5d20690) at src/backend.c:1252
        sess = 0x63a5290
        cli_conn = 0x0
        srv_conn = 0x57cca60
        old_conn = 0x5c18d38
        srv_cs = 0x0
        srv = 0xdd5ae0
        reuse = 1
        reuse_orphan = 0
        err = 0
        i = 5
#2  0x0000000000461949 in sess_update_stream_int (s=0x5d20690) at src/stream.c:928
        conn_err = 0
        srv = 0xdd5ae0
        si = 0x5d20968
        req = 0x5d206a0
#3  0x00000000004659c9 in process_stream (t=0x5483e70, context=0x5d20690, state=1025) at src/stream.c:2305
        srv = 0xdd5ae0
        s = 0x5d20690
        sess = 0x5c18c90
        rqf_last = 209715202
        rpf_last = 2147483648
        rq_prod_last = 7
        rq_cons_last = 0
        rp_cons_last = 7
        rp_prod_last = 0
        req_ana_back = 32768
        req = 0x5d206a0
        res = 0x5d20700
        si_f = 0x5d20928
        si_b = 0x5d20968
#4  0x00000000005737ab in process_runnable_tasks () at src/task.c:432
        t = 0x5483e70
        state = 1025
        ctx = 0x5d20690
        process = 0x46321f <process_stream>
        t = 0x38e0970
        max_processed = 197
#5  0x00000000004ba236 in run_poll_loop () at src/haproxy.c:2619
        next = -368939204
---Type <return> to continue, or q <return> to quit---
        exp = -368939207
#6  0x00000000004ba5b5 in run_thread_poll_loop (data=0xd8cf30) at src/haproxy.c:2684
        ptif = 0xb49360 <per_thread_init_list>
        ptdf = 0x0
        start_lock = 0
#7  0x00000000004bbdef in main (argc=6, argv=0x7ffdc6650748) at src/haproxy.c:3313
        tids = 0xd8cf30
        threads = 0xfab490
        i = 1
        old_sig = {__val = {0, 0, 29, 140508019246944, 24, 13980944, 30208, 11586480, 140727931963264, 140727931963208, 6, 6329043, 140727931962568, 13, 2, 0}}
        blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        err = 0
        retry = 200
        limit = {rlim_cur = 400091, rlim_max = 400091}
        errmsg = "\000\005e\306\375\177\000\000\000\000\000\000\000\000\000\000|", '\000' <repeats 15 times>, "|\000\000\000\000\000\000\000`G\223\222\312\177\000\000\030\000\000\000\000\000\000\000\200\346\323\000\000\000\000\000>\001\000\024\000\000\000\000\260\367\260\000\000\000\000\000`\021\325\000\000\000\000\000\254\061_\222\312\177\000\000\000\006", <incomplete sequence \306>
        pidfd = 6

Can you try the patch attached in this email:

https://www.mail-archive.com/haproxy@formilux.org/msg32195.html

It is affecting connect_server behavior, I’m not sure it is the root cause of your issue, but it’s worth a try, since you can reproduce this so easily.

Just try the patch but still got coredump. I think my case is different as it only happened with HTTPS traffic.

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x000000000051be74 in connect_server (s=0x5d20690) at src/backend.c:1245
#2  0x0000000000461949 in sess_update_stream_int (s=0x5d20690) at src/stream.c:923
#3  0x00000000004659c9 in process_stream (t=0x5483e70, context=0x5d20690, state=1025) at src/stream.c:2296
#4  0x00000000005737ab in process_runnable_tasks () at src/task.c:417
#5  0x00000000004ba236 in mworker_pipe_register () at src/haproxy.c:2607
#6  0x00000000004ba5b5 in run_thread_poll_loop (data=0xd8cf30) at src/haproxy.c:2678
#7  0x00000000004bbdef in main (argc=6, argv=0x7ffdc6650748) at src/haproxy.c:3303
(gdb) bt full
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#1  0x000000000051be74 in connect_server (s=0x5d20690) at src/backend.c:1245
        cli_conn = 0x0
        srv_conn = 0x57cca60
        old_conn = 0x5c18d38
        srv_cs = 0x0
        srv = 0xdd5ae0
        reuse = 1
        reuse_orphan = 0
        err = 0
        i = 5
#2  0x0000000000461949 in sess_update_stream_int (s=0x5d20690) at src/stream.c:923
        conn_err = 0
        srv = 0xdd5ae0
        si = 0x5d20968
        req = 0x5d206a0
#3  0x00000000004659c9 in process_stream (t=0x5483e70, context=0x5d20690, state=1025) at src/stream.c:2296
        srv = 0xdd5ae0
        s = 0x5d20690
        sess = 0x5c18c90
        rqf_last = 209715202
        rpf_last = 2147483648
        rq_prod_last = 7
        rq_cons_last = 0
        rp_cons_last = 7
        rp_prod_last = 0
        req_ana_back = 32768
        req = 0x5d206a0
        res = 0x5d20700
        si_f = 0x5d20928
        si_b = 0x5d20968
#4  0x00000000005737ab in process_runnable_tasks () at src/task.c:417
        t = 0x5483e70
        state = 1025
        ctx = 0x5d20690
        process = 0x46321f <process_store_rules+1339>
        t = 0x38e0970
        max_processed = 197
#5  0x00000000004ba236 in mworker_pipe_register () at src/haproxy.c:2607
No locals.
#6  0x00000000004ba5b5 in run_thread_poll_loop (data=0xd8cf30) at src/haproxy.c:2678
        __pl_r = 0
        ptif = 0xb49360 <per_thread_init_list>
        ptdf = 0x0
        start_lock = 0
#7  0x00000000004bbdef in main (argc=6, argv=0x7ffdc6650748) at src/haproxy.c:3303
        cpuset = {__bits = {11206411826737, 0, 18446603345777588961, 2, 0, 0, 390842023984, 140727931962656, 0, 0, 511101108334, 0, 140727931962655, 140727931962704, 0, 0}}
        j = 0
        cpu_map = 6967993
        tids = 0xd8cf30
        threads = 0xfab490
        i = 1
        old_sig = {__val = {0, 0, 29, 140508019246944, 24, 13980944, 30208, 11586480, 140727931963264, 140727931963208, 6, 6329043, 140727931962568, 13, 2, 0}}
        blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        err = 0
        retry = 200
        limit = {rlim_cur = 400091, rlim_max = 400091}
        errmsg = "\000\005e\306\375\177\000\000\000\000\000\000\000\000\000\000|", '\000' <repeats 15 times>, "|\000\000\000\000\000\000\000`G\223\222\312\177\000\000\030\000\000\000\000\000\000\000\200\346\323\000\000\000\000\000>\001\000\024\000\000\000\000\260\367\260\000\000\000\000\000`\021\325\000\000\000\000\000\254\061_\222\312\177\000\000\000\006", <incomplete sequence \306>
        pidfd = 6

Ok. In order to reproduce the issue, can you provide a minimal set of configuration required to hit this crash? I understand you see it with SSL, but there are a lot of other variables in play and just enabling SSL is probably not enough to reproduce the issue.

Also, can you show the entire gdb output, not only the output of the backtrace please? We only have the calltrace, but the actual reason for the crash will be above that.

Could you try the latest master development tree? We would like to see if this is something that has already been fixed in -dev.

You would just git clone the tree and then your have everyhing in the haproxy folder:

git clone http://git.haproxy.org/git/haproxy.git/

It takes a few minutes to clone the three.

Meanwhile I forwarded your coredump to Willy and Oliver, which are gonna take a look at this.

I’ve tried the latest clone from master but it still crashing

[ 2379.431666] haproxy[12856]: segfault at 0 ip (null) sp 00007ffec804aa38 error 14 in haproxy[400000+186000]

Does it crash immediately with the first SSL connection or does it work for some time and crash then?

Can you guys please apply the following 4 patches (to either 1.9 or master) in the following tarball and retry:

https://dts.ltri.eu/d.php/e742fb25967bb68db8a6cfcaaa796c4d/19-safaricrash.tar

still got coredump after apply the 4 patches (patch 0002 empty).

(gdb) bt full
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#1  0x000000000051bf1a in connect_server (s=0x32b7480) at src/backend.c:1258
        sess = 0x309ac20
        cli_conn = 0x0
        srv_conn = 0x376ec20
        old_conn = 0x38a8788
        srv_cs = 0x0
        srv = 0x293ca40
        reuse = 1
        reuse_orphan = 0
        err = 0
#2  0x0000000000461a57 in sess_update_stream_int (s=0x32b7480) at src/stream.c:928
        conn_err = 0
        srv = 0x293ca40
        si = 0x32b7758
        req = 0x32b7490
#3  0x0000000000465ad7 in process_stream (t=0x32b7870, context=0x32b7480, state=1025) at src/stream.c:2305
        srv = 0x293ca40
        s = 0x32b7480
        sess = 0x2d6cd60
        rqf_last = 209715202
        rpf_last = 2147483648
        rq_prod_last = 7
        rq_cons_last = 0
        rp_cons_last = 7
        rp_prod_last = 0
        req_ana_back = 32768
        req = 0x32b7490
        res = 0x32b74f0
        si_f = 0x32b7718
        si_b = 0x32b7758
#4  0x0000000000573a4c in process_runnable_tasks () at src/task.c:432
        t = 0x32b7870
        state = 1025
        ctx = 0x32b7480
        process = 0x46332d <process_stream>
        t = 0x2e457e0
        max_processed = 200
#5  0x00000000004ba385 in run_poll_loop () at src/haproxy.c:2619
        next = -226266230
        exp = -226266296
#6  0x00000000004ba704 in run_thread_poll_loop (data=0x2931f40) at src/haproxy.c:2684
        ptif = 0xb49360 <per_thread_init_list>
        ptdf = 0x0
        start_lock = 0
#7  0x00000000004bbf3e in main (argc=6, argv=0x7ffd52f2a6f8) at src/haproxy.c:3313
        tids = 0x2931f40
        threads = 0x2b504a0
        i = 1
        old_sig = {__val = {0, 139834890709042, 29, 139834894120800, 24, 42968336, 30208, 11586480, 2, 0, 0, 0, 0, 0, 0, 0}}
        blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        err = 0
        retry = 200
        limit = {rlim_cur = 400091, rlim_max = 400091}
        errmsg = "\000\244\362R\375\177\000\000\000\000\000\000\000\000\000\000|", '\000' <repeats 15 times>, "|\000\000\000\000\000\000\000`\367\060\331-\177\000\000\030\000\000\000\000\000\000\000\200\066\216\002\000\000\000\000>\001\000\024\000\000\000\000\260\367\260\000\000\000\000\000`a\217\002\000\000\000\000\254\341\374\330-\177\000\000\260\245\362R"
        pidfd = 6

Something went wrong with the tarball, patch 2 must not be empty.

I reuploaded the tarball and double checked that all patches are there:

https://dts.ltri.eu/d.php/f0c88d2e04bf1f2ef8dfbbe84c3027ee/safaripatches2.tar

now got the 4 patches but still coredump as below.

(gdb) bt full
#0  0x0000000000000000 in ?? ()
No symbol table info available.
#1  0x000000000051bf1a in connect_server (s=0x2c61600) at src/backend.c:1258
        sess = 0x2c001f0
        cli_conn = 0x0
        srv_conn = 0x2bc5f80
        old_conn = 0x24ee408
        srv_cs = 0x0
        srv = 0x1c59460
        reuse = 1
        reuse_orphan = 0
        err = 0
#2  0x0000000000461a57 in sess_update_stream_int (s=0x2c61600) at src/stream.c:928
        conn_err = 0
        srv = 0x1c59460
        si = 0x2c618d8
        req = 0x2c61610
#3  0x0000000000465ad7 in process_stream (t=0x2c619f0, context=0x2c61600, state=1025) at src/stream.c:2305
        srv = 0x1c59460
        s = 0x2c61600
        sess = 0x223fc80
        rqf_last = 209715202
        rpf_last = 2147483648
        rq_prod_last = 7
        rq_cons_last = 0
        rp_cons_last = 7
        rp_prod_last = 0
        req_ana_back = 32768
        req = 0x2c61610
        res = 0x2c61670
        si_f = 0x2c61898
        si_b = 0x2c618d8
#4  0x0000000000573a4c in process_runnable_tasks () at src/task.c:432
        t = 0x2c619f0
        state = 1025
        ctx = 0x2c61600
        process = 0x46332d <process_stream>
        t = 0x2f62cb0
        max_processed = 195
#5  0x00000000004ba385 in run_poll_loop () at src/haproxy.c:2619
        next = -200494007
        exp = -200494064
---Type <return> to continue, or q <return> to quit---
#6  0x00000000004ba704 in run_thread_poll_loop (data=0x1bf5f40) at src/haproxy.c:2684
        ptif = 0xb49360 <per_thread_init_list>
        ptdf = 0x0
        start_lock = 0
#7  0x00000000004bbf3e in main (argc=6, argv=0x7ffd72f92b48) at src/haproxy.c:3313
        tids = 0x1bf5f40
        threads = 0x1be9c20
        i = 1
        old_sig = {__val = {0, 0, 29, 139802521089888, 24, 29091088, 30208, 11586480, 140726532385664, 140726532385608, 6, 6329987, 140726532384968, 13, 2, 0}}
        blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
        err = 0
        retry = 200
        limit = {rlim_cur = 400091, rlim_max = 400091}
        errmsg = "\000)\371r\375\177\000\000\000\000\000\000\000\000\000\000|", '\000' <repeats 15 times>, "|\000\000\000\000\000\000\000`\267\233O&\177\000\000\030\000\000\000\000\000\000\000\200v\272\001\000\000\000\000>\001\000\024\000\000\000\000\260\367\260\000\000\000\000\000`\241\273\001\000\000\000\000\254\241gO&\177\000\000\000*\371r"
        pidfd = 6

Nope. It is crashing few moments later.

Please apply the following patch on top of the 4 patches above (but not the patch I send to @safari privately), so the the 4 patches in the tarball and then the following patch:

diff --git a/src/backend.c b/src/backend.c
index 39b40587..4be61585 100644
--- a/src/backend.c
+++ b/src/backend.c
@@ -1158,7 +1158,7 @@ int connect_server(struct stream *s)
 				srv_list = LIST_ELEM(s->sess->srv_list.n,
 						struct sess_srv_list *, srv_list);
 				if (!LIST_ISEMPTY(&srv_list->srv_list))
-					srv_conn = LIST_ELEM(srv_list->srv_list.n,
+					srv_conn = LIST_ELEM(srv_list->conn_list.n,
 						struct connection *, session_list);
 			}
 		}
-- 
2.14.4

This should fix the issue.

@lukastribus, unfortunately, I still got the coredump.

Ok, just to confirm, you are doing a clean rebuild with make clean before recompiling, right? Just trying to make 100% that we are using solid conclusions, sorry if it’s a stupid question.

What I did are: remove the haproxy folder, unzip the haproxy tarball, apply the 4-patches, then the latest patch, make and run.

Do you have any hints about what we should try to reproduce your issue ? How many servers should we need, SSL or not to the server, htx or not, H1 or H2 to servers, any particular setting of http-reuse, etc. Every such thing would be really helpful. Also, if you have any hint about the rough number of requests the process supports before crashing, it would help us figure what type of test to focus on (i.e. if it crashes from the second request, no need to run 1 million request through each config).

Thanks!