Problems enabling Keep-Alive over to high-latency backend

The goal I am trying to achieve is to keep a few SSL connections open from HAProxy on a replica in the EU to our primary server in the US. Due to latency and TLS handshakes doing API calls costs 6-700ms when the connection has to be established for every call. Right now I keep a connection pool using an nginx backend (local nginx proxy) which works well and brings request times down to 200ms which is much more bearable. However we are experiencing random failures between nginx and the AWS ELB in the US. I couldn’t figure out why exactly but some requests fail when under high load.

Hence I thought I would try HAProxy to replace nginx but I hit a few problems:

  • Firstly it seems impossible to do connection pooling on an SNI backend. I wish there was a way to turn this safety off as we ever only connect to one host name on that IP and things can not go wrong at that level.

  • I then tried to drop TLS and just do plain HTTP to even see if it would be more reliable than nginx, but this config below seems to fail to do keep-alive to the backend.

frontend replica_front
    bind 127.0.0.1:8080
    mode http
    default_backend primary

backend primary
    balance roundrobin
    http-request set-header Host example.com
    http-request set-header Connection keep-alive
    option httpchk HEAD /login HTTP/1.1\r\nHost:example.com
    option http-keep-alive
    option srvtcpka
    http-reuse always
    server node1 internal-*.elb.amazonaws.com:80 check resolvers dns resolve-prefer ipv4

resolvers dns
    nameserver a 8.8.8.8:53
    nameserver b 8.8.4.4:53
    nameserver c 127.0.0.1:53

After doing a few curl requests locally against haproxy, it seems not to have shared/reused any backend sessions between client sessions:

image

Any help here would be appreciated.

It appears HAProxy closes the backend connections as soon as frontend clients disconnect, which kind of kills my use case entirely. Never mind then.

Correct, haproxy does not support connection-pooling yet, therefor, it cannot currently satisfy your use-case.

Just for the record, I now bypass the AWS ELB and do load-balancing using an nginx upstream hitting EC2 boxes in our primary region directly, and that seems to have resolved the random failures. It appears ELB isn’t coping well with long running keep-alive connections.

ELB is layer4 balancer and has no idea about your layer7 keep-alive connections. You can align its tcp idle coonection timeout to your max expected timeouts so it doesn’t close/reset the connections on you (it supports timeouts up to 3600 sec). Or enable tcp (note tcp here NOT http) keep-alive on your client.