I have a problem with latest haproxy 1.9.0, http traffic is OK, but when running with https traffic, haproxy got segfault errors and crashed. Segfault errors are as below:
[ 6374.791610] haproxy[2741]: segfault at 7f141e6e3ab8 ip 00007f141e6e3ab8 sp 00007ffea3eab4b8 error 15 in libc-2.17.so[7f141e6e3000+2000]
[ 6376.080835] haproxy[2739]: segfault at 60 ip 0000000000000060 sp 00007ffea3eab4b8 error 14 in haproxy-1.9.0[400000+46a000]
[ 6385.632464] haproxy[2762]: segfault at b0 ip 00000000004cc0da sp 00007fff64bd3360 error 4 in haproxy-1.9.0[400000+46a000]
[ 6389.265346] haproxy[2764]: segfault at 0 ip (null) sp 00007fff64bd3358 error 14 in haproxy-1.9.0[400000+46a000]
[ 6389.546879] traps: haproxy[2766] general protection ip:4cc0da sp:7fff64bd3360 error:0 in haproxy-1.9.0[400000+46a000]
[ 6389.571351] haproxy[2763]: segfault at ffffffffffffffb8 ip ffffffffffffffb8 sp 00007fff64bd3358 error 15
[ 6390.114721] traps: haproxy[2767] general protection ip:4cc0da sp:7fff64bd3360 error:0 in haproxy-1.9.0[400000+46a000]
[ 6391.928882] haproxy[2765]: segfault at ffffffffffffffb8 ip ffffffffffffffb8 sp 00007fff64bd3358 error 15
[ 7565.677404] haproxy[8910]: segfault at 96 ip 00000000004cc0da sp 00007ffcb2fdf250 error 4 in haproxy-1.9.0[400000+46a000]
[ 7566.251417] haproxy[8909]: segfault at ffffffffffffffb8 ip ffffffffffffffb8 sp 00007ffcb2fdf248 error 15
[ 7569.549036] haproxy[8912]: segfault at 0 ip (null) sp 00007ffcb2fdf248 error 14 in haproxy-1.9.0[400000+46a000]
[ 7570.831296] haproxy[8913]: segfault at 0 ip (null) sp 00007ffcb2fdf248 error 14 in haproxy-1.9.0[400000+46a000]
[ 7572.139128] traps: haproxy[8911] general protection ip:4cc0da sp:7ffcb2fdf250 error:0 in haproxy-1.9.0[400000+46a000]
[ 7576.601277] traps: haproxy[8908] general protection ip:4cc0da sp:7ffcb2fdf250 error:0 in haproxy-1.9.0[400000+46a000]
haproxy -vv
HA-Proxy version 1.9.0 2018/12/19 - https://haproxy.org/
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_STATIC_PCRE2=1 USE_PCRE2_JIT=1 USE_TFO=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with OpenSSL version : OpenSSL 1.1.1a 20 Nov 2018
Running on OpenSSL version : OpenSSL 1.1.1a 20 Nov 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.32 2018-09-10
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTX side=FE|BE
h2 : mode=HTTP side=FE
<default> : mode=HTX side=FE|BE
<default> : mode=TCP|HTTP side=FE|BE
Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace
When haproxy crashes, you will have a corefile in the working directory. You can either run it through gdb yourself:
gdb haproxy core
(gdb)$ bt full
Or send the haproxy executable and the corefile in a tar to me (can provide private upload instructions via private message). Notice that the corefile will contain private data such as SSL keys, IP addresses hostnames and even parts of the transactions.
#0 0x0000000000000000 in ?? ()
#1 0x000000000051be74 in connect_server (s=0x5d20690) at src/backend.c:1252
#2 0x0000000000461949 in sess_update_stream_int (s=0x5d20690) at src/stream.c:928
#3 0x00000000004659c9 in process_stream (t=0x5483e70, context=0x5d20690, state=1025) at src/stream.c:2305
#4 0x00000000005737ab in process_runnable_tasks () at src/task.c:432
#5 0x00000000004ba236 in run_poll_loop () at src/haproxy.c:2619
#6 0x00000000004ba5b5 in run_thread_poll_loop (data=0xd8cf30) at src/haproxy.c:2684
#7 0x00000000004bbdef in main (argc=6, argv=0x7ffdc6650748) at src/haproxy.c:3313
(gdb) bt full
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#1 0x000000000051be74 in connect_server (s=0x5d20690) at src/backend.c:1252
sess = 0x63a5290
cli_conn = 0x0
srv_conn = 0x57cca60
old_conn = 0x5c18d38
srv_cs = 0x0
srv = 0xdd5ae0
reuse = 1
reuse_orphan = 0
err = 0
i = 5
#2 0x0000000000461949 in sess_update_stream_int (s=0x5d20690) at src/stream.c:928
conn_err = 0
srv = 0xdd5ae0
si = 0x5d20968
req = 0x5d206a0
#3 0x00000000004659c9 in process_stream (t=0x5483e70, context=0x5d20690, state=1025) at src/stream.c:2305
srv = 0xdd5ae0
s = 0x5d20690
sess = 0x5c18c90
rqf_last = 209715202
rpf_last = 2147483648
rq_prod_last = 7
rq_cons_last = 0
rp_cons_last = 7
rp_prod_last = 0
req_ana_back = 32768
req = 0x5d206a0
res = 0x5d20700
si_f = 0x5d20928
si_b = 0x5d20968
#4 0x00000000005737ab in process_runnable_tasks () at src/task.c:432
t = 0x5483e70
state = 1025
ctx = 0x5d20690
process = 0x46321f <process_stream>
t = 0x38e0970
max_processed = 197
#5 0x00000000004ba236 in run_poll_loop () at src/haproxy.c:2619
next = -368939204
---Type <return> to continue, or q <return> to quit---
exp = -368939207
#6 0x00000000004ba5b5 in run_thread_poll_loop (data=0xd8cf30) at src/haproxy.c:2684
ptif = 0xb49360 <per_thread_init_list>
ptdf = 0x0
start_lock = 0
#7 0x00000000004bbdef in main (argc=6, argv=0x7ffdc6650748) at src/haproxy.c:3313
tids = 0xd8cf30
threads = 0xfab490
i = 1
old_sig = {__val = {0, 0, 29, 140508019246944, 24, 13980944, 30208, 11586480, 140727931963264, 140727931963208, 6, 6329043, 140727931962568, 13, 2, 0}}
blocked_sig = {__val = {18446744067199990583, 18446744073709551615 <repeats 15 times>}}
err = 0
retry = 200
limit = {rlim_cur = 400091, rlim_max = 400091}
errmsg = "\000\005e\306\375\177\000\000\000\000\000\000\000\000\000\000|", '\000' <repeats 15 times>, "|\000\000\000\000\000\000\000`G\223\222\312\177\000\000\030\000\000\000\000\000\000\000\200\346\323\000\000\000\000\000>\001\000\024\000\000\000\000\260\367\260\000\000\000\000\000`\021\325\000\000\000\000\000\254\061_\222\312\177\000\000\000\006", <incomplete sequence \306>
pidfd = 6
It is affecting connect_server behavior, I’m not sure it is the root cause of your issue, but it’s worth a try, since you can reproduce this so easily.
Ok. In order to reproduce the issue, can you provide a minimal set of configuration required to hit this crash? I understand you see it with SSL, but there are a lot of other variables in play and just enabling SSL is probably not enough to reproduce the issue.
Also, can you show the entire gdb output, not only the output of the backtrace please? We only have the calltrace, but the actual reason for the crash will be above that.
Please apply the following patch on top of the 4 patches above (but not the patch I send to @safari privately), so the the 4 patches in the tarball and then the following patch:
Ok, just to confirm, you are doing a clean rebuild with make clean before recompiling, right? Just trying to make 100% that we are using solid conclusions, sorry if it’s a stupid question.
Do you have any hints about what we should try to reproduce your issue ? How many servers should we need, SSL or not to the server, htx or not, H1 or H2 to servers, any particular setting of http-reuse, etc. Every such thing would be really helpful. Also, if you have any hint about the rough number of requests the process supports before crashing, it would help us figure what type of test to focus on (i.e. if it crashes from the second request, no need to run 1 million request through each config).