Kubernetes, DNS resolver, accepted_payload_size, and large number of backends


(using HAproxy 1.8.3)

I’m attempting to use the DNS resolver to load balance traffic to Kubernetes pods. It works well for around 100 backends. When we bump the backends to ~150, HAproxy starts thrashing backend UP/DOWN messages.

Is this related to DNS limitations? What happens when the SRV records don’t fit in 8192 bytes?

Per https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#5.3.2-accepted_payload_size

Defines the maximum payload size accepted by HAProxy and announced to all the
name servers configured in this resolvers section.
<nb> is in bytes. If not set, HAProxy announces 512. (minimal value defined
     by RFC 6891)

Note: to get bigger responses but still be sure that responses won't be
  dropped on the wire, one can choose a value between 1280 and 1410.

Note: the maximum allowed value is 8192.



When the response is bigger than the max accepted payload, it happens what you just saw.
Could you take a quick tcpdump capture of a few DNS packets?
I would like to check something (found a bug with consul where the announced size is downgraded to 1280 bytes for unknown reasons)

Please send me the capture to bedis9@gmail.com

You can workaround this behavior by adding a “hold obsolete 1m” in your resolvers sections.
This means HAProxy will consider a DNS record as obsolete if it does not receive it for 1 minute from the server. So the server will be marked as DOWN after this period of time and a new record would be reaffected to it.
This will limit the UP/DOWN you’re seeing.

The long term resolution will to make DNS requests over TCP (hopefully for HAProxy 1.9)