Haproxy reusing wrong backend when using tcp mode

Hi, I have a weird problem with my Haproxy setup.

I have several haproxies on top on each other (to manage routing through subdomain yyy.mydomain.com, and then sub-subdomain xxx.yyy.mydomain.com), and I have the following problem (only in tcp mode) :

When I open one site (subdomain) in my browser (let’s say https://backoffice.mydomain.com), when I go to another site (another subdomain, for example: https://chat.mydomain.com), it will still go to the first site (so backoffice in my example), it is not a redirection, the url is the correct one, but the backend is the wrong one.

What’s weird is that every try to get to one subdomain will now get me to backoffice. If I try some time later, I can finally get to the correct backend (so if I go to https://chat.mydomain.com, I will correctly go to it this time). But then, every other subdomain can only get me to chat, and so on…

Another thing is that the problem does NOT happen when using sub-subdomain (for example: service1.backoffice.mydomain.com will correctly get me to the correct service everytime, even when all the subdomains are “stuck” to only one backend).

I don’t know if I’m clear, I can try to rewrite the description if need be (I have a hard time describing it simply).

Here is some of my conf:

The global and defaults are the same on every haproxy:

global
    log stdout format raw local0 notice
    maxconn 32000
    ulimit-n 65536
    pidfile /var/run/haproxy.pid
    uid 33
    gid 33
    daemon
    quiet
    nbproc       1

defaults
    log     global
    mode   http
    option  httplog
    option  dontlognull
    option  forwardfor
    retries 3
    option redispatch
    maxconn 20000
    timeout connect 10s
    timeout client  50s
    timeout server  60m
    timeout tunnel  60m
    option http-server-close
    balance roundrobin

Here is the conf in the “top” haproxy (routing by subdomains):

frontend tcp
  bind *:446
  mode tcp

  tcp-request inspect-delay 5s
  tcp-request content accept if { req_ssl_hello_type 1 }

  # routing to other haproxies via req_ssl_sni...
  acl mysubdomain req_ssl_sni -m sub .mysubdomain.
  use_backend mysubdomain if mysubdomain
  # etc

Here is the conf for the subdomain haproxies (running in swarm stacks):

resolvers docker
  nameserver dns 127.0.0.11:53

frontend https
  bind *:443 ssl crt /usr/local/etc/haproxy/certs alpn h2,http/1.1

  # routing to services
  acl myservice hdr_beg(host) myservice.
  use_backend myservice if myservice
  # etc

  # Defaulting on the "most important" service
  default_backend mydefaultbackend
1 Like

This is because you are routing based on SNI. SNI is in the SSL client_hello, the initial packet of the SSL handshake, but once the initial packet is send and haproxy has made a routing decision (based on the unencrypted SNI value), the specific TCP connection stays on that backend. Haproxy becomes a TCP tunnel.

But because on the backend for backoffice.mydomain.com you are serving a certificate that is also valid for chat.mydomain.com (due to wildcard or multi-SAN certificates), the browser will use the same TCP/SSL connection for that, even though this is not what you expect.

Because service1.backoffice.mydomain.com is not a valid hostname in the certificate the backend for backoffice.mydomain.com is serving.

For example a wildcard certificate *.mydomain.com is INVALID for service1.backoffice.mydomain.com, so the browser does not reuse the connection, because it already knows that the backend server is not authoritative for this.

Solutions: either find a way to load-balance not based on SNI (for example, by terminating SSL on the first haproxy layer) and looking at the host header, or you need to make sure that one certificate is not valid for a different service (not using wildcard certificates, don’t add SANs for other services into your multi SAN certificate).

2 Likes

Hi !

Thank you very much for your (very) clear and in-depth explanation, and for the possible solutions !

I do this with a loop termination thingy; (i also have wildcard certs)

TCP 443 Frontend>Backend-back-to-Another HTTPS x443 frontend that then uses ssl_fc_sni -i to evaluate what real backend that should be used - from there i go unencrypted behind HA
my traffic is DDP websockets, so its the only way i got it working.

Thank you for the detailed explanation! We were running into this exact same issue and were struggling to identify how to articulate let alone resolve it. As we were tinkering with our haproxy config we discovered that it seems to get fixed when we remove http2 support from the frontend, e.g.,

bind *:443 ssl crt /usr/local/etc/haproxy/certs alpn http/1.1

I assume this isn’t the right way to solve this but I’m not sure I understand how to deal with this with multiple haproxy layers…

Disabling h2 may help for some browser. It’s not the correct fix.

I mentioned 2 possible solutions: stop relying on SNI for your routing matters if possible (but it often isn’t), or stop using overlapping certificates.

Thanks for the fast reply! I switched from ssl_fc_sni to hdr(host) -i which fixed things.