Rather odd issue.. Layer 6 timeouts. I am scratching my head

Hi friends,

I have a setup with Haproxy as a reverse proxy on the host to a containerized Apache server and Nginx server. We have been getting some DDOS type of attacks and our site will start showing 429 too many requests error. But whenever we would restart Apache the site would be back up. But recently, we got hit so hard that the maxconn in Haproxy got breached. Thereafter, despite restarting Apache, Haproxy and Nginx and even rebooting the server we see that Haproxy fails for this site with Layer 6 Timeout error and shows the backend to be DOWN.

I believe Layer 6 indicates issue with SSL. We are using Proxy Protocol and SSL pass through mode. I tried creating another containerized Apache server as a backup, but it just won’t work :frowning:
Haproxy is version 2.8.3. I did try downgrading Haproxy (i.e installed lower versions) but it still won’t work. We are using Letsencrypt certificates. So, I even renewed them to see if that fixes it, but no luck.

I have a couple more containerized Apaches running on the same host which work perfectly fine but they do not get attacked as much.

Wondering how I can go about fixing this issue. Any help will be greatly appreciated.

Thanks in advance.

Output of haproxy -vv:
HAProxy version 2.8.3-86e043a 2023/09/07 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2028.
Known bugs: http://www.haproxy.org/bugs/bugs-2.8.3.html
Running on: FreeBSD 13.2-RELEASE-p3 FreeBSD 13.2-RELEASE-p3 GENERIC amd64
Build options :
TARGET = freebsd
CPU = generic
CC = cc
CFLAGS = -O2 -pipe -fstack-protector-strong -fno-strict-aliasing -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wnull-dereference -fwrapv -Wno-unknown-warning-option -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -DFREEBSD_PORTS
OPTIONS = USE_GETADDRINFO=1 USE_OPENSSL=1 USE_ACCEPT4=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_PCRE2=1 USE_PCRE2_JIT=1
DEBUG = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -51DEGREES +ACCEPT4 -BACKTRACE +CLOSEFROM +CPU_AFFINITY -CRYPT_H -DEVICEATLAS -DL -ENGINE -EPOLL -EVPORTS +GETADDRINFO +KQUEUE -LIBATOMIC +LIBCRYPT -LINUX_CAP -LINUX_SPLICE -LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING -NETFILTER -NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_WOLFSSL -OT -PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL -PRCTL +PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -RT +SHM_OPEN -SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 -SYSTEMD -TFO +THREAD -THREAD_DUMP +TPROXY -WURFL +ZLIB

Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=8).
Built with OpenSSL version : OpenSSL 1.1.1w 11 Sep 2023
Running on OpenSSL version : OpenSSL 1.1.1w 11 Sep 2023
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with zlib version : 1.2.13
Running on zlib version : 1.2.13
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with PCRE2 version : 10.42 2022-12-11
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with clang compiler version 14.0.5 (GitHub - llvm/llvm-project: The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. llvmorg-14.0.5-0-gc12386ae247c)

Available polling systems :
kqueue : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use kqueue.

Available multiplexer protocols :
(protocols marked as cannot be specified using ‘proto’ keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
: mode=HTTP side=FE|BE mux=H1 flags=HTX
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
: mode=TCP side=FE|BE mux=PASS flags=

Available services : none

Available filters :
[BWLIM] bwlim-in
[BWLIM] bwlim-out
[CACHE] cache
[COMP] compression
[FCGI] fcgi-app
[SPOE] spoe
[TRACE] trace

Relevant Config:

global
daemon
maxconn 60000
log /dev/log local2 notice
stats socket /run/haproxy/admin.sock mode 600 level admin

defaults
mode tcp
option dontlognull
timeout http-request 10s
timeout queue 1m
timeout connect 5s
timeout client 10s
timeout server 30s
timeout http-keep-alive 10s
timeout check 10s
timeout tarpit 1m
backlog 10000

frontend http
bind *:80
mode http
redirect scheme https if !{ ssl_fc }

frontend https
bind *:443
mode tcp
tcp-request inspect-delay 5s
option http-server-close
maxconn 20000
tcp-request connection track-sc1 src
tcp-request connection reject if { src_get_gpc0 gt 0 }
stick-table type ip size 200k expire 30s store gpc0
acl source_is_abuser src_get_gpc0 gt 0
tcp-request connection track-sc0 src if !source_is_abuser
tcp-request content accept if { req_ssl_hello_type 1 }
use_backend bak_429 if source_is_abuser
acl cluster1 req.ssl_sni -i XXX.com
acl cluster1 req.ssl_sni -i www.XXX.com
acl cluster2 req.ssl_sni -i abc.XXX.com
use_backend cluster1_bak if cluster1
use_backend cluster2_bak if cluster2
default_backend cluster1_bak

backend cluster1_bak
mode tcp
option tcplog
option log-health-checks
tcp-request inspect-delay 5s
tcp-request content accept if { req_ssl_hello_type 1 }
balance roundrobin
option ssl-hello-chk
http-check connect ssl alpn h2,http/1.1
#server webserver 10.10.3.2:443 send-proxy check inter 2000 rise 2 fall 5
server apacheserver01 10.10.3.2:443 send-proxy check ssl verify none force-tlsv13
server apacheserver02 10.10.3.19:443 send-proxy check ssl verify none force-tlsv13 weight 2

backend cluster2_bak
mode tcp
tcp-request inspect-delay 5s
tcp-request content accept if { req_ssl_hello_type 1 }
option ssl-hello-chk
server nginxserver01 10.10.3.7:4443 send-proxy check verify none inter 2000 rise 2 fall 5

backend bak_429
mode http
timeout tarpit 2s
http-request tarpit deny_status 429

Sounds like during the DDoS some erroneous configuration to the cluster1_bak backends was applied.

ssl verify none force-tlsv13 none of this is correct when using SSL passthrough.

SSL passthrough means connecting one TCP socket with another one, it has nothing to do with SSL. When you enable SSL on the backend of an already encrypted TCP connection, it will get encrypted twice. Of course your Apache backends will not decrypt SSL twice.

If and only if Apache supports and enables the proxy protocol, then you can leave send-proxy it in there.

I’d suggest you check your backups from before the DDoS and restore that configuration, if you have doubts.

Thanks Lukas for your prompt response. It was just send-proxy checkearlier prior to the attack. I have been changing it to see what is going wrong after googling around.

I will share with you the older version tommorrow as I am not near the server now.

Hi Lukas,

My apologies I couldn’t get the older version of the config yesterday. I do understand its a weekend and I expect a response from you only by Monday.

Here is the older version. Note that we used to pass both http as https to the Apache server which handled redirecting http to https. Now we are doing it at Haproxy itself. The Haproxy log wasn’t enabled/configured properly by us at that time.

global
    daemon
    maxconn 4096
    log /var/run/log local0 notice
#    stats socket haproxy.stats level admin

defaults
    mode tcp
    option  dontlognull
    timeout http-request    10s
    timeout queue           1m
    timeout connect 5s
    timeout client 10s
    timeout server 30s
    timeout http-keep-alive 10s
    timeout check           10s
    timeout tarpit 1m
    backlog 10000


#listen stats
#  bind 0.0.0.0:8880
#  stats enable
#  stats hide-version
#  stats uri     /
#  stats realm   HAProxy Statistics
#  stats auth    admin:admin


frontend http
   bind *:80
   mode http
   log global
   option httplog
   timeout client 25s
   maxconn 10000

   stick-table type ip size 1m expire 1m store gpc0,http_req_rate(10s),http_err_rate(10s)
   tcp-request connection track-sc1 src
   tcp-request connection reject if { src_get_gpc0 gt 0 }

   option http-server-close
   option forwardfor

   http-request deny if HTTP_1.0
   http-request add-header X-Forwarded-Proto http
   acl pathsniffers path -i -m sub -f <path>/sniffpatterns.txt
   http-request silent-drop if pathsniffers
   acl badbot hdr_sub(User-Agent) -i -f <path>/badbots.txt
   http-request silent-drop if badbot
   acl static_file path_end .css .js .jpg .jpeg .gif .ico .png .webp .avif .woff .woff2 .eot .pdf
   http-request track-sc0 src table throttle_sticktable if !static_file
   acl fast_client sc0_gpc0_rate gt 10
   acl max_connections sc0_conn_cur gt 20
   acl VALID_DOMAIN hdr_sub(host) XXX.com ABC.com 
   http-request silent-drop if !VALID_DOMAIN
   use_backend bak_429 if max_connections
   use_backend bk_http_slow if fast_client
   acl cluster1 req.ssl_sni -i XXX.com
   acl cluster1 req.ssl_sni -i www.XXX.com
   acl cluster2 req.ssl_sni -i abc.XXX.com
   use_backend cluster1_bak_http if cluster1
   use_backend cluster2_bak_http if cluster2
   default_backend cluster1_bak_http

   
   
   frontend https
    bind *:443
    mode tcp
    tcp-request inspect-delay 5s
    option http-server-close
   
    tcp-request connection track-sc1 src
    tcp-request connection reject if { src_get_gpc0 gt 0 }

    stick-table type ip size 200k expire 30s store gpc0
    acl source_is_abuser src_get_gpc0 gt 0
    tcp-request connection track-sc0 src if !source_is_abuser
    tcp-request content accept if { req_ssl_hello_type 1 }
    acl cluster1_sec req.ssl_sni -i XXX.com
    acl cluster1_sec req.ssl_sni -i www.XXX.com
    acl cluster2_sec req.ssl_sni -i abc.XXX.com
    use_backend cluster1_bak_https if cluster1_sec
    use_backend cluster2_bak_https if cluster2_sec
    default_backend cluster1_bak_https

backend cluster1_bak_http
    mode http
    option forwardfor
    server apacheserver_http 10.10.3.2:80 check

backend cluster1_bak_https
    mode tcp
    tcp-request inspect-delay 5s
    tcp-request content accept if { req_ssl_hello_type 1 }
    option ssl-hello-chk
    server apacheserver_https 10.10.3.2:443 send-proxy check inter 2000 rise 2 fall 5

backend cluster2_bak_http
    mode http
    option forwardfor
    server nginxserver_http 10.10.3.7:8001 check

backend cluster2_bak_https
    mode tcp
    tcp-request inspect-delay 5s
    tcp-request content accept if { req_ssl_hello_type 1 }
    option ssl-hello-chk
    server nginxserver_https 10.10.3.7:4443  send-proxy check inter 2000 rise 2 fall 5

backend bak_429
    mode http
    timeout tarpit 2s
    http-request tarpit deny_status 429

backend bk_http_slow
   mode http
   timeout tarpit 20s
   http-request tarpit

I already explained to you why the configuration doesn’t work.

I didn’t need your old configuration, that was just a suggestion as an additional help to you.

Ok thanks Lukas, However it fails even if I remove:

 `ssl verify none force-tlsv13

We didn’t make any changes to the configs of both Haproxy and Apache until Haproxy stopped connecting to the backend.Everything was working perfectly fine for many months. And its happening only for the default backend. All other backends are working. It seems like as though the default backend has gone down permanently. Not sure how we can revive it…

It was showing some error regarding maxconn and so I cleared the tables and counters using socat. Now it shows Layer 6 timeout consistently.

Wondering if we have stumbled over some bug… ?

Let’s start with you sharing the haproxy logs.

Also try disabling health checks, just to see if there is an actual problem with transit traffic or “just” a health check issue.

Remove:

option ssl-hello-chk
http-check connect ssl alpn h2,http/1.1

and most importantly remove the check keyword from the servers.

If it works, use tcp checks as opposed to SSL checks.

If this doesn’t help, you will have to share the actual haproxy log messages, without modifications.

After some head scratching, I managed to fix the issue. Its working with check as well. So its not a bug after all :slight_smile:
I had to change the port of the default backend to get it to work. Not sure how it had been working all these months without any issue.
Thanks for all your help Lukas.

1 Like