Problem with backend selection - after a few successful hits suddenly wrong backend is chosen with no config change

Hi experts!

I have been using HAProxy for quite some time now and with most of the applications i run through it I have no problems at all. There are two sites however, that give me a lot of headaches. When testing in single user mode (just me on HAProxy and the webserver) i can run into a reproduceable situation that the server just “stops answering”. First few clicks work - then chrome is stuck “(pending)”. What i see in the logfiles is a wrong backend being selected in those requests. there is no configuration change and from the firewall i don’t see any packets going from HAProxy to the actual web server

here the log:

working:
2023-04-21T09:53:53.998735+02:00 xxxxxxx haproxy[16677]: ::ffff:10.x.x.6:52986 [21/Apr/2023:09:53:53.996] fe_generic_ssl_termination~ be_sdr/xxhsdr01_80 0/0/1/1/2 200 6318 - - ---- 16/6/0/0/0 0/0 {sdr.xxxx.xxxx.xx|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Sa} "GET https://sdr.xxxx.xxxx.xx/yyyyyyyyyy/zzzzzzzzzz.uuu HTTP/2.0"


not working:
2023-04-21T10:58:54.190458+02:00 xxxxxxx haproxy[16677]: ::ffff:10.x.x.6:54556 [21/Apr/2023:10:58:14.185] fe_generic_ssl_termination~ be_default_https/dummy 0/30003/-1/-1/40004 503 0 - - sC-- 8/2/0/0/3 0/0 {sdr.xxxx.xxxx.xxxx|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Sa} "GET https://sdr.xxxx.xxxx.xx/yyyyyyyyyy/zzzzzzzzzz.uuu HTTP/2.0"

I tried various timeout settings but i always come back to the same problem- it just stops working after a few clicks. The timeout will most likely come from the non existing backend that i use to deter connection attempts with invalid hostnames.

Here is a sanitized config containing all the way through to this backend

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  redispatch
    no option httpclose
    retries                 3
    maxconn                 10000

    timeout connect 10s
    timeout client 30s
    timeout server 30s

frontend ssl_frontend
    bind :::443 v4v6
    mode tcp

    option tcplog
    log global

    timeout client 6h
    tcp-request inspect-delay 2s
    tcp-request content accept if { req_ssl_hello_type 1 }

    acl client_attempts_ssh payload(0,7) -m bin 5353482d322e30
    use_backend xxxxxxx_ssh if client_attempts_ssh
    use_backend openvpn if !{ req.ssl_hello_type 1 } !{ req.len 0 }
    use_backend be_xxxxx_vpn if { req.ssl_sni -m end vpn.xxxx.xxxx.xx }
    use_backend be_rdp_tsc if { req.ssl_sni -m end rdgateway.xxxx.xx }
    default_backend be_generic_ssl_termination

backend be_generic_ssl_termination
    mode tcp
    server loopback abns@fe_generic_ssl_termination send-proxy-v2


frontend fe_generic_ssl_termination
    bind abns@fe_generic_ssl_termination accept-proxy ssl crt-list /etc/haproxy/crt-list.conf ca-file xxxxxxxxxx.pem alpn h2,http/1.1
    mode http

    option forwardfor       except 127.0.0.0/8

    capture request header Host len 32
    capture request header User-Agent len 100

    log global

    # Use letsencrypt backend for certificate validation
    acl is_well_known path -m reg ^/.well-known/acme-challenge/
    use_backend be_letsencrypt if is_well_known

    use_backend be_service1      if { ssl_fc_has_crt } { ssl_fc_sni -i service1.xxxx.xxxx.xx }
    use_backend be_service2      if { ssl_fc_has_crt } { ssl_fc_sni -i service2.xxxx.xxxx.xx }
    use_backend be_service3      if { ssl_fc_has_crt } { ssl_fc_sni -i service3.xxxx.xxxx.xx }
    use_backend be_service4      if { ssl_fc_has_crt } { ssl_fc_sni -i service4.xxxx.xxxx.xx }
    use_backend be_service6      if { ssl_fc_sni -i service6.xxxx.xxxx.xx }
    use_backend be_sdr           if { ssl_fc_has_crt } { ssl_fc_sni -i sdr.xxxx.xxxx.xx }
    use_backend be_service5      if { ssl_fc_has_crt } { ssl_fc_sni -i service5.xxxx.xxxx.xx }
    
    default_backend be_default_https

backend be_default_https
    server dummy 10.0.0.1:80

backend be_sdr
    balance source
    mode http
    server xxhsdr01_80 xxhsdr01.xxxx.xxxx.xx:80 verify none no-check maxconn 100

could anyone help me by pointing out obvious configuration errors or any way on how to debug the backend selection process? In the bad cases haproxy always chooses be_default_https/dummy although the be_sdr backend is available, has 0 out of 100 connections and all checking is disabled by now.

Thanks + best regards

Michael

You need to show /etc/haproxy/crt-list.conf too.

In frontend fe_generic_ssl_termination:

  • it’s not clear what { ssl_fc_has_crt } is supposed to do here; are you verifying client certificates? How come there is no verify keyword?
  • ssl_fc_sni is wrong here, you need to use hdr(host) as per HAProxy version 2.6.12-2 - Configuration Manual so you can use overlapping certificates

In frontend ssl_frontend:

  • you can’t use certificates that match both vpn.xxxx.xxxx.xx and rdgateway.xxxx.xx, because SNI based routing on works at the beginning of the handshake (ssl client_hello) only and when you access a different resources the routing decision can’t be repeated.

Thank you very much for the hints.

crt-list.conf:

dummy.pem [verify none]
default.xxxx.xxxxx.xx.pem [verify optional]
sdr.xxxx.xxxxx.xx.pem [verify optional]
service1.xxxx.xxxx.xx.pem [verify optional]
service2.xxxx.xxxx.xx.pem [verify none]
service3.xxxx.xxxx.xx.pem [verify optional]

to be honest the config was copied together over the course of several years from internet sources so I already expected that there are some stuipd approaches there.

ssl_fc_has_crt seems to work for me - the verify keyword is in the crt-list as i have some services verified and some not. according to the docs i realized now that i should rather switch to ssl_c_used

ssl_fc_sni will be replaced by host.

these two points could very reasonably be the cause as the problems only happen on subsequent requests where sni might not be populated (correctly) or the connection could have been resumed which is not captured by ssl_fc_has_crt according to the docs

the last point about the SNI based routing i do not really understand. this part of the config seems to work because in order to reach the http frontend at all it has to fall through there. also vpn seems to work without problems. could you elaborate the concern some more?

Thank you very much for your quick reply. i will make the changes and come back here with the result after some testing.

best regards

Michael

Ok, so the verify optional part comes from the crt-list, this makes sense now.

Replacing ssl_fc_sni with hdr(host) will likely fix the issue.

If there are no issues with routing in frontend ssl_frontend that’s good; I just mentioned it because it could have the same problem, if the certificates overlap and browser reuse the connection to access the other host.

Hi!

Thanks again for the help. I just wanted to give the follow up that so far - even with extensive testing - the issue has not yet reappeared! The config errors also served as a reminder to look at some of the mess in my config file and i found lots of things that deserve a changing :slight_smile:

best regards

Michael

1 Like