Running HAProxy v 1.8.17 I’ve come across the following scenario:
I have a single listen block with one server listed, with the endpoints dns address being used and resolved against google DNS servers. The endpoint rotates between IP’s every 30 seconds or so however they only return one A record in the DNS response at any one time (i.e. either 10.0.0.1 or 10.0.0.2 is returned per DNS request).
The behavior I’ve observed is that when restarting/reloading haproxy it will perform a DNS query, retrieve the IP address and send health checks to that address i.e.10.0.0.1. Once the server is up/passes health checks in haproxy traffic will flow as normal and respect any subsequent DNS updates (i.e traffic to 10.0.0.1 for 30s, dns update, traffic to 10.0.0.2 for 30s, repeat).
However, if the endpoint takes one of the IP’s offline the backend server will fail health checks and remain in a downed state until a manual reload/restart is performed.
After reviewing the logs and multiple packet captures I believe that the behaviour is that haproxy is selecting the IP from the original DNS query done at start up, health checking that endpoint and then never updating the IP throughout the lifetime of the process.
For example, the original DNS query returns 10.0.0.1, haproxy health checks this and sees it is up, so updates the backend server to be up. HAProxy is continuing to healthcheck 10.0.0.1 even though subsequent DNS updates return 10.0.0.2. Once the endpoint 10.0.0.1 is taken offline, all healthchecks fail and the server is marked down.
The bizarre thing is that this only affects health checks, any “normal” traffic passing through the frontend is respecting DNS updates and being routed to either 10.0.0.1 or 10.0.0.2 depending on the latest DNS query.
I’ve tried cutting down the DNS valid hold time to 1s etc however the health check never updates the IP within the process.
Does anyone have a workaround for this or has observed this behavior before? I’ve come across an old thread that would give me a workaround if the endpoint was returning all A records, however this does not work for this scenario, instead haproxy marks the second backend server as MAINT as it cannot resolve a second IP
Old thread:
Config:
global
chroot /var/lib/haproxy
daemon
group haproxy
log 127.0.0.1 local2
maxconn 4000
pidfile /var/run/haproxy.pid
ssl-default-bind-ciphers ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11
user haproxy
defaults
log global
log-format "[%tr] %ft %s %bi %si:%sp %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"
option log-separate-errors
maxconn 3000
mode http
option forwardfor
option redispatch
option dontlognull
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
option log-health-checks
resolvers dnsgoogle
nameserver google8 8.8.8.8:53
nameserver google4 8.8.4.4:53
resolve_retries 3
timeout retry 1s
hold valid 10s
listen my_example
bind 0.0.0.0:10001 ssl crt /etc/ssl/certs/example.pem
mode http
option forceclose
server exaple_endpoint_01 myexampleendpoint.com:443 check verify none crt /etc/ssl/certs/haproxy/myexampleclientcert.pem resolvers dnsgoogle inter 15s ssl rise 2 fall 2