DNS Resolver: trying to understand interaction between `timeout` and `hold`


#1

I’m trying to use the DNS SRV resolver feature with a local Consul agent and haproxy 1.8.14. I have a basic configuration working, but I’d like to get a specific behaviour when Consul is down, and I’m not sure what the right timeout and hold settings are.

The behaviour I want is:

When the local Consul agent is working (DNS SRV queries return VALID answers), re-do the SRV query and update my server-template configuration every 2 seconds.

When the local Consul agent is unavailable, leading to REFUSED or TIMEOUT (or maybe OTHER) errors:

  • continue retrying the SRV request every 2 seconds, forever (until the queries start succeeding again)
  • keep using the last valid response until queries start succeeding again, with no timeout

Here’s what I have so far, using very large retries and timeouts to simulate “forever”. I’ve done some testing and it seems to work (WEB-APP stayed available when the local consul agent was down for several minutes) but I’d love to get some feedback from someone who understands these settings better.

resolvers consul
    nameserver consul 127.0.0.1:53
    accepted_payload_size 8192
    timeout resolve 2s
    timeout retry   2s
    resolve_retries 100000
    hold other    50000s
    hold refused  50000s
    hold timeout  50000s
    hold nx       5s
    hold valid    5s
    hold obsolete 5s

listen WEB-APP
    bind 127.0.0.1:50002
    mode http
    ...
    server-template web-app 50 _web-app._tcp.service.consul resolvers consul resolve-prefer ipv4 check inter 2s

#2

CCing @Baptiste


#3

Hi Irving,

Well, timeouts are for the resolution processing itself, while hold applies to the result of the resolution.

  • timeout resolve 2s => this will trigger a resolution every 2s

  • hold timeout 50s means that the latest valid answer is kept for 50s in case of timeout (and so on for each hold period)

Timeout retry is how often you want haproxy to re-send a DNS query when timeout resolve is reached and before considering the resolution as failed.

Note that the timeout resolve + hold timeout above should do the trick in your case.