Kubernetes, DNS resolver, accepted_payload_size, and large number of backends


#1

(using HAproxy 1.8.3)

I’m attempting to use the DNS resolver to load balance traffic to Kubernetes pods. It works well for around 100 backends. When we bump the backends to ~150, HAproxy starts thrashing backend UP/DOWN messages.

Is this related to DNS limitations? What happens when the SRV records don’t fit in 8192 bytes?

Per https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#5.3.2-accepted_payload_size

Defines the maximum payload size accepted by HAProxy and announced to all the
name servers configured in this resolvers section.
<nb> is in bytes. If not set, HAProxy announces 512. (minimal value defined
     by RFC 6891)

Note: to get bigger responses but still be sure that responses won't be
  dropped on the wire, one can choose a value between 1280 and 1410.

Note: the maximum allowed value is 8192.

#2

Hi,

When the response is bigger than the max accepted payload, it happens what you just saw.
Could you take a quick tcpdump capture of a few DNS packets?
I would like to check something (found a bug with consul where the announced size is downgraded to 1280 bytes for unknown reasons)

Please send me the capture to bedis9@gmail.com

You can workaround this behavior by adding a “hold obsolete 1m” in your resolvers sections.
This means HAProxy will consider a DNS record as obsolete if it does not receive it for 1 minute from the server. So the server will be marked as DOWN after this period of time and a new record would be reaffected to it.
This will limit the UP/DOWN you’re seeing.

The long term resolution will to make DNS requests over TCP (hopefully for HAProxy 1.9)