After deploying the new HAProxy version (the previous was 1.9.5), the access.log is flooding with messages like:
Jun 21 11:08:04 172.30.1.153:4594 [21/Jun/2019:11:08:04.294] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:04 172.30.1.30:38852 [21/Jun/2019:11:08:04.410] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:04 172.30.2.55:45926 [21/Jun/2019:11:08:04.670] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:14 172.30.1.153:4602 [21/Jun/2019:11:08:14.295] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:14 172.30.1.30:38950 [21/Jun/2019:11:08:14.409] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:14 172.30.2.55:45982 [21/Jun/2019:11:08:14.670] lb-useast/lb-useast_frontend: SSL handshake failure
The HAProxy instances is located behind AWS Elastic Load Balancer (in classic mode). After a little investigation, I’ve come up that those errors are caused by AWS ELB TCP health checks.
tcpdump pcap is here https://www.dropbox.com/s/bwnadkmbkn6fgx6/elbhc.pcap?dl=0
frontend lb-useast
mode http
bind *:3080 accept-proxy name lb-useast_frontend_http
bind *:3443 accept-proxy name lb-useast_frontend ssl ...
Is there a way to hide such errors? Notice that version 1.9.5 does not show them.
Can you provide the output of haproxy -vv of both your new and your old deployment? This could also depend on the OpenSSL version.
However I think it’s more likely that in 2.0 we have fixed some logging bugs, so that those handshake failure actually make it to the syslog.
What I would suggest is that you configure better health checks in AWS. You should be able to set them to HTTPS or SSL, so that an actual, full handshake is done.
# haproxy -vv
HA-Proxy version 2.0.0 2019/06/16 - https://haproxy.org/
Build options :
TARGET = linux-glibc
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
OPTIONS = USE_PCRE=1 USE_THREAD=1 USE_REGPARM=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_ZLIB=1 USE_TFO=1 USE_NS=
Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO -NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.3
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 8.21 2011-12-12
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTX side=FE|BE mux=H2
h2 : mode=HTTP side=FE mux=H2
<default> : mode=HTX side=FE|BE mux=H1
<default> : mode=TCP|HTTP side=FE|BE mux=PASS
Available services : none
Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace
# haproxy -vv
HA-Proxy version 1.9.5 2019/03/19 - https://haproxy.org/
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits -DTCP_USER_TIMEOUT=18
OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.3
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 8.21 2011-12-12
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTX side=FE|BE
h2 : mode=HTTP side=FE
<default> : mode=HTX side=FE|BE
<default> : mode=TCP|HTTP side=FE|BE
Available filters :
[SPOE] spoe
[COMP] compression
[CACHE] cache
[TRACE] trace
I tried to tweak AWS health checks but there’s nothing to fix. The ELB is set to proxy TCP traffic, so we can’t set HTTPS or SSL as a health check.
It seems to me that 1.9.5 behaves more correctly than 2.0.0 in this case, b/c there were no attempts to set up an SSL connection (just port opened and then closed), so no error should be displayed, since there’re no errors actually.
The only difference between 1.9.5 and 2.0.0 behavior is that due to a different error path, in 1.9.5 the error is:
Jun 24 22:45:09 haproxy[20998]: 127.0.0.1:34194 [24/Jun/2019:22:45:09.870] my-https-frontend/1: Connection error during SSL handshake
While in 2.0.0 it’s: Jun 24 22:43:53 haproxy[20417]: 127.0.0.1:34042 [24/Jun/2019:22:43:53.788] my-https-frontend/1: SSL handshake failure
So I can’t confirm that 1.9.5 does not show this error. It’s just a different error message. This is with haproxy 1.9.5 and openssl 1.0.1e just like in your environment.
Can you confirm that you indeed see this error in 1.9.5?
Yes, I set the bind instance to accept-proxy and the health-checking instance sends the proxy-protocol.
But there is a difference in behavior, the health check simulation here pushes the proxy protocol with the final ACK of the 3 way handshake and also closes the TCP session with an RST, while your capture from ELB closes it with a FIN.
I assume that is causing 1.9.5 to behave differently.
Whatever, I disagree that haproxy should hide those errors. We are expecting an SSL sessions here, if there is none, we absolutely not hide that error.
On ELB, the health check configuration should be independent of the actual traffic, even if you just forward TCP, you should still be able to configure a HTTPS health check:
Glad you were able to narrow down the issue! What’s wrong with closing tcp connection with FIN rather than with RST?
Regarding the SSL errors - I understand your point. However, it does not seem to me relevant. As an admin of haproxy instances I’d like to know when something goes wrong (this is what error reporting is for), right? But there’s nothing wrong if somebody opens a tcp connection and then closes it w/o sending any data. I’d like to see real SSL errors (like protocol mismatch). Hope you’ll change your mind and make haproxy behave in such situations like it did before. Thank you!
Regarding HTTPS health checks - yeah, it’s possible. But I’ll need to create one more listener just to serve ELB healthchecks, b/c the current listeners expect PROXY protocol only. Not sure this is a good solution.
Nothing is wrong with that. I’m just saying that’s what probably triggers a different behavior.
It’s a log message with informational severity.
If you don’t want to send that, cap your syslogs at notice or warning level:
log 1.2.3.4 syslog notice
That’s your specific point of view. Others may need to troubleshoot this exact behavior. You don’t know you are looking for when your start troubleshooting.
I strongly disagree to map different SSL error codes to different syslog severity levels. Everybody probably has a different opinions about each of those, requirements may even be different from one project to another.
Those specific requirements need to be addressed on the syslog server side.
I agree with you that there may be different opinions and different projects. But the bottom line is: there were a lot of haproxy releases before 2.0 and nobody (at least i didn’t see) argued that the current error logging should be improved in that particular part to report an SSL error on tcp connection open/close. However, starting from 2.0 you did change that.
And, again, why do you consider such thing as an SSL error? There were no SSL handshake at all, so no errors can be registered. I wouldn’t mind to report that as “empty connection” or something like that. But it’s definitely not an SSL error.
It’s undefined behavior in all releases. It behaves differently in 1.9 and 2.0 because the error handling of ssl changed. But it’s still undefined behavior.
Handling it requires writing additional code. I disagree that this is important. There are a ton of debug and informational messages that we can argue back and forth whether everyone likes the wording or not.
Like I said you can limit syslogging to certain levels if you don’t want to see those informational messages.
If you don’t like it, I suggest you open a thread on the mailing list for further discussion with others.
I tried to reproduce the behavior your are seeing in 1.9.5, which a health check identical to the one that ELB produces.
I was able to make a simulated health check which is indeed identical to what ELB does according to the traffic capture you provided, by simply using nc:
Haproxy 1.9.5 with OpenSSL 1.0.1e behaves differently then, with yet another message:
[...] my-https-frontend/1: Connection closed during SSL handshake
It’s not a Connection error or a handshake failure, as my previous repro suggested.
But it’s also not what your are seeing, because you are saying there is no log at all, which I absolutely cannot explain at this point (because the health check is identical now).
Are you positive your syslog doesn’t filter and shows you only messages with “error” or “failure” in the text? Because you should definitely see Connection closed during SSL handshake syslogs for those health checks in Haproxy 1.9.5.
Jun 27 07:40:25 127.0.0.1:56274 [27/Jun/2019:07:40:25.637] lb-useast/lb-useast_frontend: Received something which does not look like a PROXY protocol header
and no records for the 2nd command.
One probably interesting thing is that this command did not produce any records in the log as well:
I assume you are checking both access.log and status.log?
I don’t have a clue then why you don’t see this message, I’d suggest one last attempt at understanding it, by capturing the syslog traffic haproxy sends:
tcpdump -i lo -pnvvvvs0 host 127.0.0.1 and udp port 514
I’d also like you to try haproxy 2.0 (latest stable) based on a newer OpenSSL release, at least 1.1.0 or newer. Because the change I was referring to only impacts the error handling in OpenSSL releases that are older than 1.1.0 and I’d like to know what you are seeing in that case.