Haproxy sometimes sends SSL Handshake Errors

I observe SSL Handshake failures in my Haproxy logs. Sometimes this occurs when the web application is trying to save data or is receiving a larger post and it causes the web application to throw an error also.

Environment notes: We are running RHEL 7 - Using Haproxy to interface two active webbackends servers and one backup webbackend servers We are using Cherrypy for the web application.

Here is an example error in the cherrypy logs:

server02 [4.xxx] [INF] (web/services/web/site-packages/cherrypy/_cplogging.py) 10.111.233.123 - - "GET /~api/entity/tMatch%22%3Atrue%7D%5D HTTP/1.1" 200 41 "https://www.example.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0"
  File "/web/services/web/site-packages/OpenSSL/SSL.py", line 1545, in _raise_ssl_error
    raise ZeroReturnError()
  File "/web/services/web/site-packages/OpenSSL/SSL.py", line 1734, in recv_into
    self._raise_ssl_error(self._ssl, result)
  File "/web/services/web/site-packages/OpenSSL/SSL.py", line 1545, in _raise_ssl_error
    raise ZeroReturnError()
OpenSSL.SSL.ZeroReturnError

And here is the corresponding entry in haproxy:

Apr 26 13:27:38 localhost haproxy[2487]: 192.200.5.43:50600 [26/Apr/2021:13:27:38.136] www.example.com:80/2: SSL handshake failure

Here is the Haproxy build info:
HA-Proxy version 2.0.14 2020/04/02 - https://haproxy.org/
Build options :
TARGET = linux-glibc
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
OPTIONS = USE_PCRE=1 USE_THREAD=1 USE_LIBCRYPT=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED -REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.5
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services : none

Available filters :
	[SPOE] spoe
	[COMP] compression
	[CACHE] cache
	[TRACE] trace

Here is my Haproxy Configuration file:

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
 # to have these messages end up in /var/log/haproxy.log you will
 # need to:
 #
 # 1) configure syslog to accept network log events.  This is done
#    by adding the '-r' option to the SYSLOGD_OPTIONS in
#    /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
#   file. A line like the following can be added to
#   /etc/sysconfig/syslog
#
#    local2.*                       /var/log/haproxy.log
#
log         127.0.0.1 local2 info
chroot      /var/lib/haproxy
pidfile     /var/run/haproxy.pid
maxconn     4000
user        haproxy
group       haproxy
daemon
tune.ssl.default-dh-param 2048
ssl-server-verify required
ssl-default-bind-options  no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
ssl-default-bind-ciphers  ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
# turn on stats unix socket
stats socket /var/run/haproxy/info.sock mode 660 level user user report group zabbix
stats timeout 2m # Wait up to 2 minutes for input

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          5m
    timeout server          5m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 300
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
frontend  example.com:80
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/web.pem
    log  global
    option httplog
    http-request set-header Forwarded for=%[src]
    mode http
    option forwardfor
    default_backend             web_servers
#---------------------------------------------------------------------
# round robin balancing between the various backends
#---------------------------------------------------------------------
backend web_servers
    balance roundrobin
    mode http
    option httpchk GET /#
    cookie SERVERUSED insert indirect nocache
    redirect scheme https code 301 if !{ ssl_fc }
    redirect prefix https://www.example.com code 301 if { hdr(host) -i old.example.com }
    #List all the web servers below
    timeout queue 10s
    server  backendweb1 10.111.233.122:8443 check ssl verify required ca-file /etc/ssl/certs/web.pem cookie web1
    server  backendweb2 10.111.233.123:8443 check ssl verify required ca-file /etc/ssl/certs/web.pem cookie web2
    server  backendweb3 10.170.218.123:8443 check ssl verify required ca-file /etc/ssl/certs/web.pem cookie web3 backup

listen stats # Define a listen section called "stats"
 bind :::8404 v4v6 ssl crt /etc/ssl/certs/web.pem # Listen on localhost 8404
 mode http
 stats enable  # Enable stats page
 stats realm Haproxy\ Statistics  # Title text for popup window
 stats uri /stats  # Stats URI
 stats refresh 10s
 stats auth xxxxx:xxxxxxxxxxx  #Authentication credential
 stats admin if LOCALHOST

Just a quick note - this is only happening when passing through HAProxy - I can reproduce when doing a larger POST. If I point to the application directly and do a large post, it passes just fine.

More information - it appears to be something around the SSL check.

If remove the SSL reencrypt (SSL termination at Haproxy only), it will work. It also works in TCP mode - but has issues when using the configuration above. We would like to re-encrypt this traffic between Haproxy and the web backends.

Can you post us the config of your webserver? Which server are you using? Does it work if you disable ssl in the backend? How long does your large request take to finish if it can reproduce? ZeroReturnError might also indicate a timeout …

  • We are using Cherrypy web server. - CherryPy-18.6.0-

  • It does work if I disable SSL in the backend and it finishes quickly. I’ve been trying to troubleshoot SSL between the server and HAProxy, but so far do not know how I should.

  • It may be a timeout - I don’t know why it only happens when SSL is re-encrypted on the backends, however.

I guess I have to apologize - I was a bit sloppy in reviewing the information provided. After some thought, I’m afraid that the problem is not necessarily caused by HAPROXY. From my perspective, HAPROXY might just be the deliverer of bad news here - especially if the query works via HAPROXY without SSL in the backend.

You say you can reproduce the error - this will be the key to solve the riddle:
Does the error also occur natively against the backend? Just by sending the request directly without HAPROXY? (Important is that the request also uses SSL).

Otherwise, in my opinion, the next step would be to run TCPDUMP on your backend and look at both handshakes - once with / by HAPROXY - once natively against the app.

Thanks IhrName -

No, when I target the application directly and bypass HAProxy, I do not get the error. It is only when we pass through the proxy…

Just to clarify my above comments, We are decrypting at the frontend, and then reencrypting to the backends. We are not using SSL termination at HAProxy only, but reencrypting the data as it goes between the proxy and the backends too.

Then I think your next step should be to run TCPDUMP on your backend and look at both handshakes - once with / by HAPROXY - once natively against the app. I know this sucks - sorry.

No problem. I’ll get that test done shortly.

Direct Connection

18:00:21.449736 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 64)
10.122.188.33.63974 > 10.172.185.30.pcsync-https: Flags [SEW], cksum 0x4bf7 (correct), seq 4231487197, win 65535, options [mss 1287,nop,wscale 6,nop,nop,TS val 1049397620 ecr 0,sackOK,eol], length 0
18:00:21.449831 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.172.185.30.pcsync-https > 10.122.188.33.63974: Flags [S.E], cksum 0xad28 (incorrect -> 0x564d), seq 3487592686, ack 4231487198, win 28960, options [mss 1460,sackOK,TS val 4080558407 ecr 1049397620,nop,wscale 7], length 0
18:00:21.485855 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 64)
10.122.188.33.63976 > 10.172.185.30.pcsync-https: Flags [SEW], cksum 0x578b (correct), seq 3522985101, win 65535, options [mss 1287,nop,wscale 6,nop,nop,TS val 1049397865 ecr 0,sackOK,eol], length 0
18:00:21.485955 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.172.185.30.pcsync-https > 10.122.188.33.63976: Flags [S.E], cksum 0xad28 (incorrect -> 0x1047), seq 829907150, ack 3522985102, win 28960, options [mss 1460,sackOK,TS val 4080558443 ecr 1049397865,nop,wscale 7], length 0
18:00:21.486007 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 64)
10.122.188.33.63974 > 10.172.185.30.pcsync-https: Flags [S], cksum 0x4ad4 (correct), seq 4231487197, win 65535, options [mss 1287,nop,wscale 6,nop,nop,TS val 1049398103 ecr 0,sackOK,eol], length 0
18:00:21.486025 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.172.185.30.pcsync-https > 10.122.188.33.63974: Flags [S.E], cksum 0xad28 (incorrect -> 0x5629), seq 3487592686, ack 4231487198, win 28960, options [mss 1460,sackOK,TS val 4080558443 ecr 1049397620,nop,wscale 7], length 0
18:00:21.528492 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 52)
10.122.188.33.63974 > 10.172.185.30.pcsync-https: Flags [.], cksum 0xec33 (correct), seq 1, ack 1, win 2051, options [nop,nop,TS val 1049398199 ecr 4080558407], length 0
18:00:21.564802 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 647)
10.122.188.33.63974 > 10.172.185.30.pcsync-https: Flags [P.], cksum 0x7b93 (correct), seq 1:596, ack 1, win 2051, options [nop,nop,TS val 1049398201 ecr 4080558407], length 595
18:00:21.564870 IP (tos 0x0, ttl 64, id 4474, offset 0, flags [DF], proto TCP (6), length 52)
10.172.185.30.pcsync-https > 10.122.188.33.63974: Flags [.], cksum 0xad20 (incorrect -> 0xf082), seq 1, ack 596, win 236, options [nop,nop,TS val 4080558522 ecr 1049398201], length 0
18:00:21.570792 IP (tos 0x2,ECT(0), ttl 64, id 4475, offset 0, flags [DF], proto TCP (6), length 310)

Haproxy Connection:

18:01:36.991825 IP (tos 0x0, ttl 62, id 14940, offset 0, flags [none], proto TCP (6), length 40)
10.122.188.33.63976 > 10.172.185.30.pcsync-https: Flags [.], cksum 0xe2e5 (correct), seq 3522987209, ack 829912275, win 2048, length 0
18:01:36.991879 IP (tos 0x0, ttl 64, id 290, offset 0, flags [DF], proto TCP (6), length 52)
10.172.185.30.pcsync-https > 10.122.188.33.63976: Flags [.], cksum 0xad20 (incorrect -> 0x66b3), seq 1, ack 1, win 264, options [nop,nop,TS val 4080633949 ecr 1049399278], length 0
18:01:37.030522 IP (tos 0x0, ttl 62, id 11220, offset 0, flags [none], proto TCP (6), length 40)
10.122.188.33.63974 > 10.172.185.30.pcsync-https: Flags [.], cksum 0xf11f (correct), seq 4231488880, ack 3487612239, win 2048, length 0
18:01:37.030584 IP (tos 0x0, ttl 64, id 4503, offset 0, flags [DF], proto TCP (6), length 52)
10.172.185.30.pcsync-https > 10.122.188.33.63974: Flags [.], cksum 0xad20 (incorrect -> 0x71dc), seq 1, ack 1, win 255, options [nop,nop,TS val 4080633987 ecr 1049400034], length 0
18:01:40.970231 IP (tos 0x0, ttl 62, id 59418, offset 0, flags [none], proto TCP (6), length 40)
10.122.188.33.63978 > 10.172.185.30.pcsync-https: Flags [.], cksum 0x4347 (correct), seq 3158332663, ack 2835866221, win 2048, length 0
18:01:40.970294 IP (tos 0x0, ttl 64, id 42554, offset 0, flags [DF], proto TCP (6), length 52)
10.172.185.30.pcsync-https > 10.122.188.33.63978: Flags [.], cksum 0xad20 (incorrect -> 0xa844), seq 1, ack 1, win 255, options [nop,nop,TS val 4080637927 ecr 1049403197], length 0
18:01:47.673675 IP (tos 0x0, ttl 62, id 46420, offset 0, flags [none], proto TCP (6), length 40)
10.122.188.33.63976 > 10.172.185.30.pcsync-https: Flags [.], cksum 0xe2e5 (correct), seq 0, ack 1, win 2048, length 0
18:01:47.673740 IP (tos 0x0, ttl 64, id 291, offset 0, flags [DF], proto TCP (6), length 52)
10.172.185.30.pcsync-https > 10.122.188.33.63976: Flags [.], cksum 0xad20 (incorrect -> 0x3cfa), seq 1, ack 1, win 264, options [nop,nop,TS val 4080644630 ecr 1049399278], length 0
18:01:47.719677 IP (tos 0x0, ttl 62, id 28044, offset 0, flags [none], proto TCP (6), length 40)
10.122.188.33.63974 > 10.172.185.30.pcsync-https: Flags [.], cksum 0xf11f (correct), seq 0, ack 1, win 2048, length 0
18:01:47.719717 IP (tos 0x0, ttl 64, id 4504, offset 0, flags [DF], proto TCP (6), length 52)
10.172.185.30.pcsync-https > 10.122.188.33.63974: Flags [.], cksum 0xad20 (incorrect -> 0x481b), seq 1, ack 1, win 255, options [nop,nop,TS val 4080644676 ecr 1049400034], length 0
18:01:51.254372 IP (tos 0x0, ttl 62, id 30718, offset 0, flags [none], proto TCP (6), length 40)
10.122.188.33.63978 > 10.172.185.30.pcsync-https: Flags [.], cksum 0x4347 (correct), seq 0, ack 1, win 2048, length 0
18:01:51.254433 IP (tos 0x0, ttl 64, id 42555, offset 0, flags [DF], proto TCP (6), length 52)
10.172.185.30.pcsync-https > 10.122.188.33.63978: Flags [.], cksum 0xad20 (incorrect -> 0x8018), seq 1, ack 1, win 255, options [nop,nop,TS val 4080648211 ecr 1049403197], length 0
18:01:57.918621 IP (tos 0x0, ttl 62, id 28870, offset 0, flags [none], proto TCP (6), length 40)
10.122.188.33.63976 > 10.172.185.30.pcsync-https: Flags [.], cksum 0xe2e5 (correct), seq 0, ack 1, win 2048, length 0
18:01:57.918662 IP (tos 0x0, ttl 64, id 292, offset 0, flags [DF], proto TCP (6), length 52)
10.172.185.30.pcsync-https > 10.122.188.33.63976: Flags [.], cksum 0xad20 (incorrect -> 0x14f5), seq 1, ack 1, win 264, options [nop,nop,TS val 4080654875 ecr 1049399278], length 0
18:01:58.037367 IP (tos 0x0, ttl 62, id 27197, offset 0, flags [none], proto TCP (6), length 40)
10.122.188.33.63974 > 10.172.185.30.pcsync-https: Flags [.], cksum 0xf11f (correct), seq 0, ack 1, win 2048, length 0
18:01:58.037426 IP (tos 0x0, ttl 64, id 4505, offset 0, flags [DF], proto TCP (6), length 52)
10.172.185.30.pcsync-https > 10.122.188.33.63974: Flags [.], cksum 0xad20 (incorrect -> 0x1fcd), seq 1, ack 1, win 255, options [nop,nop,TS val 4080654994 ecr 1049400034], length 0

Hi. Thanks for the information given, but could you please provide the raw pcap file that tcpdump generates? The “summary” is not very helpful from a technical point of view.

HAProxy: Dropbox - haproxysynack.pcapng - Simplify your life

Direct Connect: Dropbox - webapp.only.pcapng - Simplify your life

Hello,

thanks for the captures. Let me see:
client: 10.11.11.23
haproxy: 10.1.112.245
webapp: 10.1.112.75
Right?

If so, the captures contain
client to haproxy where haproxy terminats the connection on handshake
client to webapp where the connection gets established

I thought the problem is in between HAPROXY and the app?
Has the connection ever worked with HAPROXY between client and app?
If the problem IS in between HAPROXY and the app - why did you capture on the client-side and not on the side of the webapp?

Yeah. It’s working between the client and the application. Sometimes it is disconnecting or giving an SSL handshake issue.

I did the dumps the following way:
TCPDUMP on a direct connection to the web application.
TCPDUMP on a the webapplication, after it passes through HAProxy.

Just to clarify, I was using wireshark from the gui instead of TCPDUMP when doing this - please advise if I need to capture this from the console with tcpdump.

We are generally seeing issues when uploading larger POST payloads where either the Proxy of the web application stops responding. I do not see the same issue when directly connected to the web application. When this happens, the web application will throw the SSL error above in the first comment.

Also, thank you for all the help you have provided so far - I really appreciate it.

You’re welcome. The problem is, you need to do the capture on the WebApp server.

However, which software you take for it doesn’t matter. If the server has a GUI installed, you can take Wireshark with gui, wireshark as well on the console, or tcpdump, since it is installed on most linux anyway.

Especially with HAPROXY in between it would be good to have a capture where the connection works - and one where the connection drops.