HAProxy 2.0 SSL handshake failure

Hi,

After deploying the new HAProxy version (the previous was 1.9.5), the access.log is flooding with messages like:

Jun 21 11:08:04 172.30.1.153:4594 [21/Jun/2019:11:08:04.294] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:04 172.30.1.30:38852 [21/Jun/2019:11:08:04.410] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:04 172.30.2.55:45926 [21/Jun/2019:11:08:04.670] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:14 172.30.1.153:4602 [21/Jun/2019:11:08:14.295] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:14 172.30.1.30:38950 [21/Jun/2019:11:08:14.409] lb-useast/lb-useast_frontend: SSL handshake failure
Jun 21 11:08:14 172.30.2.55:45982 [21/Jun/2019:11:08:14.670] lb-useast/lb-useast_frontend: SSL handshake failure

The HAProxy instances is located behind AWS Elastic Load Balancer (in classic mode). After a little investigation, I’ve come up that those errors are caused by AWS ELB TCP health checks.
tcpdump pcap is here https://www.dropbox.com/s/bwnadkmbkn6fgx6/elbhc.pcap?dl=0

frontend lb-useast
  mode http
  bind *:3080 accept-proxy name lb-useast_frontend_http
  bind *:3443 accept-proxy name lb-useast_frontend ssl ...

Is there a way to hide such errors? Notice that version 1.9.5 does not show them.

Thanks in advance!

Can you provide the output of haproxy -vv of both your new and your old deployment? This could also depend on the OpenSSL version.

However I think it’s more likely that in 2.0 we have fixed some logging bugs, so that those handshake failure actually make it to the syslog.

What I would suggest is that you configure better health checks in AWS. You should be able to set them to HTTPS or SSL, so that an actual, full handshake is done.

Hi Lukas and thanks for your prompt reply!

# haproxy -vv
HA-Proxy version 2.0.0 2019/06/16 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
  OPTIONS = USE_PCRE=1 USE_THREAD=1 USE_REGPARM=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_ZLIB=1 USE_TFO=1 USE_NS=

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO -NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.3
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 8.21 2011-12-12
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services : none

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace
# haproxy -vv
HA-Proxy version 1.9.5 2019/03/19 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits -DTCP_USER_TIMEOUT=18
  OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.3
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 8.21 2011-12-12
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE
              h2 : mode=HTTP       side=FE
       <default> : mode=HTX        side=FE|BE
       <default> : mode=TCP|HTTP   side=FE|BE

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace

I tried to tweak AWS health checks but there’s nothing to fix. The ELB is set to proxy TCP traffic, so we can’t set HTTPS or SSL as a health check.

It seems to me that 1.9.5 behaves more correctly than 2.0.0 in this case, b/c there were no attempts to set up an SSL connection (just port opened and then closed), so no error should be displayed, since there’re no errors actually.

The only difference between 1.9.5 and 2.0.0 behavior is that due to a different error path, in 1.9.5 the error is:

Jun 24 22:45:09 haproxy[20998]: 127.0.0.1:34194 [24/Jun/2019:22:45:09.870] my-https-frontend/1: Connection error during SSL handshake

While in 2.0.0 it’s:
Jun 24 22:43:53 haproxy[20417]: 127.0.0.1:34042 [24/Jun/2019:22:43:53.788] my-https-frontend/1: SSL handshake failure

So I can’t confirm that 1.9.5 does not show this error. It’s just a different error message. This is with haproxy 1.9.5 and openssl 1.0.1e just like in your environment.

Can you confirm that you indeed see this error in 1.9.5?

As I told before - there’re no such errors in 1.9.5. Are you sure you tested with accept-proxy config option in place?

Yes, I set the bind instance to accept-proxy and the health-checking instance sends the proxy-protocol.

But there is a difference in behavior, the health check simulation here pushes the proxy protocol with the final ACK of the 3 way handshake and also closes the TCP session with an RST, while your capture from ELB closes it with a FIN.

I assume that is causing 1.9.5 to behave differently.

Whatever, I disagree that haproxy should hide those errors. We are expecting an SSL sessions here, if there is none, we absolutely not hide that error.

On ELB, the health check configuration should be independent of the actual traffic, even if you just forward TCP, you should still be able to configure a HTTPS health check:

https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-healthchecks.html

Glad you were able to narrow down the issue! What’s wrong with closing tcp connection with FIN rather than with RST?

Regarding the SSL errors - I understand your point. However, it does not seem to me relevant. As an admin of haproxy instances I’d like to know when something goes wrong (this is what error reporting is for), right? But there’s nothing wrong if somebody opens a tcp connection and then closes it w/o sending any data. I’d like to see real SSL errors (like protocol mismatch). Hope you’ll change your mind and make haproxy behave in such situations like it did before. Thank you!

Regarding HTTPS health checks - yeah, it’s possible. But I’ll need to create one more listener just to serve ELB healthchecks, b/c the current listeners expect PROXY protocol only. Not sure this is a good solution.

Nothing is wrong with that. I’m just saying that’s what probably triggers a different behavior.

It’s a log message with informational severity.

If you don’t want to send that, cap your syslogs at notice or warning level:

log 1.2.3.4 syslog notice

That’s your specific point of view. Others may need to troubleshoot this exact behavior. You don’t know you are looking for when your start troubleshooting.

I strongly disagree to map different SSL error codes to different syslog severity levels. Everybody probably has a different opinions about each of those, requirements may even be different from one project to another.

Those specific requirements need to be addressed on the syslog server side.

I agree with you that there may be different opinions and different projects. But the bottom line is: there were a lot of haproxy releases before 2.0 and nobody (at least i didn’t see) argued that the current error logging should be improved in that particular part to report an SSL error on tcp connection open/close. However, starting from 2.0 you did change that.
And, again, why do you consider such thing as an SSL error? There were no SSL handshake at all, so no errors can be registered. I wouldn’t mind to report that as “empty connection” or something like that. But it’s definitely not an SSL error.

It’s undefined behavior in all releases. It behaves differently in 1.9 and 2.0 because the error handling of ssl changed. But it’s still undefined behavior.

Handling it requires writing additional code. I disagree that this is important. There are a ton of debug and informational messages that we can argue back and forth whether everyone likes the wording or not.

Like I said you can limit syslogging to certain levels if you don’t want to see those informational messages.

If you don’t like it, I suggest you open a thread on the mailing list for further discussion with others.

I will definitely refine my logging policy as you suggested.

Anyway, I do appreciate your prompt feedback on this issue - thank you!

Hope, one day you will change your mind on wording non-errors as errors. Cheers!

For later reference, as my local notes are probably gonna get lost sooner or later:

The behavior change appears to come from commit 54832b97c65c (“BUILD: enable several LibreSSL hacks, including”).

1 Like

I tried to reproduce the behavior your are seeing in 1.9.5, which a health check identical to the one that ELB produces.

I was able to make a simulated health check which is indeed identical to what ELB does according to the traffic capture you provided, by simply using nc:

echo -ne "PROXY TCP4 172.30.1.153 172.30.1.153 3052 3443\r\n" | nc 127.0.0.1 443

Haproxy 1.9.5 with OpenSSL 1.0.1e behaves differently then, with yet another message:

[...] my-https-frontend/1: Connection closed during SSL handshake

It’s not a Connection error or a handshake failure, as my previous repro suggested.

But it’s also not what your are seeing, because you are saying there is no log at all, which I absolutely cannot explain at this point (because the health check is identical now).

Are you positive your syslog doesn’t filter and shows you only messages with “error” or “failure” in the text? Because you should definitely see Connection closed during SSL handshake syslogs for those health checks in Haproxy 1.9.5.

Thanks for continuing the investigation, Lukas!

I do confirm there’re no any SSL handshake notifications like that in the logs. Here are my settings:
HAProxy:

global
    log 127.0.0.1:514 len 2048 local0
...
defaults
  log global
  option dontlognull
  option splice-auto
  option httplog
  option http-server-close
  option redispatch
  option contstats
  ...

rsyslog:

$ModLoad imudp
$UDPServerAddress 127.0.0.1
$UDPServerRun 514
$MaxMessageSize 64k
$FileCreateMode 0600

$template HAProxy,"%timegenerated%%msg%\n"
$template TraditionalFormatWithPRI,"%pri-text%: %timegenerated% %msg:::drop-last-lf%\n"

local0.=info @monitor
local0.=info -/var/log/haproxy/access.log;HAProxy
local0.notice -/var/log/haproxy/status.log;HAProxy
local0.error -/var/log/haproxy/error.log;TraditionalFormatWithPRI
### keep logs in localhost ##
local0.* ~

Just to confirm, I’ve issued two commands:

echo -ne "GET / HTTP/1.0\r\n" | nc 127.0.0.1 3443
echo -ne "PROXY TCP4 172.30.1.15 172.30.1.153 3052 3443\r\n" | nc 127.0.0.1 3443

The log record for the 1st command was:

Jun 27 07:40:25 127.0.0.1:56274 [27/Jun/2019:07:40:25.637] lb-useast/lb-useast_frontend: Received something which does not look like a PROXY protocol header

and no records for the 2nd command.

One probably interesting thing is that this command did not produce any records in the log as well:

echo -ne "PROXY TCP4 172.30.1.15 172.30.1.153 3052 3443\r\nGET / HTTP/1.0\r\n" | nc 127.0.0.1 3443

I assume you are checking both access.log and status.log?

I don’t have a clue then why you don’t see this message, I’d suggest one last attempt at understanding it, by capturing the syslog traffic haproxy sends:

tcpdump -i lo -pnvvvvs0 host 127.0.0.1 and udp port 514

I’d also like you to try haproxy 2.0 (latest stable) based on a newer OpenSSL release, at least 1.1.0 or newer. Because the change I was referring to only impacts the error handling in OpenSSL releases that are older than 1.1.0 and I’d like to know what you are seeing in that case.

Yeah, checked all logs (access, error and status).

Just tried to sniff the syslog traffic - nothing is sent to syslog when the echo with PROXY sent out.

I’ll give 2.0 another try with newer openssl, but that will take some time. Will let you know.

The change impacting openssl release 1.0.2 and older will be rolled back for 2.0.2.

I’m still not sure how this will affect you; as I cannot replicate what you are seeing (no logs).

HAProxy version 2.0.3:

Jul 30 11:10:47 172.30.1.30:42360 [30/Jul/2019:11:10:47.011] lb-useast/lb-useast_frontend: SSL handshake failure after heartbeat
Jul 30 11:11:00 172.30.2.55:21366 [30/Jul/2019:11:11:00.705] lb-useast/lb-useast_frontend: SSL handshake failure after heartbeat

haproxy -vv output please.

# haproxy -vv
HA-Proxy version 2.0.3 2019/07/23 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
  OPTIONS = USE_PCRE=1 USE_THREAD=1 USE_REGPARM=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_ZLIB=1 USE_TFO=1 USE_NS=

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO -NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.3
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 8.21 2011-12-12
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services : none

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace

Wondering why it shows “running on openssl version” 1.0.1e, when the real openssl version is:

# rpm -q openssl
openssl-1.0.2k-16.150.amzn1.x86_64