HAProxy, K8S and server_template, unstable/fluctuating server list?

Hi,

I am trying to implement some cookie-based session stickiness with HAProxy inside of K8S cluster. I am using 2.0.2-alpine image.

backend dummy-api
  mode http
  option log-health-checks
  option httpchk GET /isalive
  dynamic-cookie-key XXXXX
  cookie SESSION_COOKIE rewrite nocache dynamic
  balance roundrobin
  option httpclose
  server-template srv-ns 8 _http-api-port._tcp.dummy-api-service.default.svc.cluster.local resolvers k8s check check inter 10s downinter 20s fastinter 5s resolve-opts allow-dup-ip

I am observing really odd behavior from HAProxy - it constantly reevaluates the state of this backend and brings the servers up and down every couple of seconds. No server stays around for more than 1-2 minutes. Not to mention that I have 8 pods (all alive and well) and HAProxy sees only 5 or 6 out of them. I see the following pattern in the debug logs:

srv-ns3 changed its FQDN from (null) to api-4.dummy-api-service.default.svc.cluster.local by 'SRV record'
srv-ns4 changed its FQDN from (null) to api-4.dummy-api-service.default.svc.cluster.local by 'SRV record'
srv-ns3 is going DOWN for maintenance (No IP for server ). 5 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
....
srv-ns3 changed its IP from 10.1.1.108 to 10.1.1.109 by DNS cache.
Server dummy-api-service-nosession/srv-ns3 ('api-4.dummy-api-service.default.svc.cluster.local') is UP/READY (resolves again).
Server dummy-api-service-nosession/srv-ns3 administratively READY thanks to valid DNS answer.
dummy-api-service-nosession/srv-ns3 changed its IP from 10.1.1.108 to 10.1.1.109 by DNS cache.
...

I can assure you that the pods themselves are alive and well. In fact, I have attempted an alternativel configuration using just 8 “server” lines - all 8 are green 100% and never go down.

There is something odd about it. I have noticed that the order of the SRV records constantly changes in K8S - but this is expected, the order is not guaranteed in DNS anyway.

P.S. Attempted the same configuration with version 1.9 - same result. When using service-template, the list of available servers constantly changes, they go up and down and some of them never get to UP state.

Some additional details. I have done some experiments changing the size of the upstream pool, e.g. number of running API pods in K8S. It appears that if I have fewer than 6 pods, HAProxy behaves correctly - the list of upstreams is stable, nothing goes down. Once I start the 7th pod, things go nuts and the servers start going up and down in HAProxy randomly.

Here is my guess. It must have something to do with the size of the DNS response. Maybe once the response gets larger than certain amount (and most likely split into multiple UDP datagrams), HAProxy fails to read it all and, given the fact that DNS always reshuffles the list of records, HAProxy gets random partial results every time it performs the lookup? This is just a guess.

Another update. I think I was partially right with my previous guess. I did the same test going from 6 to 7 pods and doing tcpdump on the DNS traffic. Here is the DNS request done by HAproxy:

15:02:48.653711 IP 10.1.1.165.35723 > 10.96.0.10.53: 22188+ [1au] SRV? _http-api-port._tcp.demo-api-service.default.svc.cluster.local. (91)
15:02:48.657678 IP 10.96.0.10.53 > 10.1.1.165.35723: 22188*| 6/0/0 SRV api-2.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-3.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-4.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-5.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-6.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-0.demo-api-service.default.svc.cluster.local.:8080 10 14 (236)

I have 7 records but only 6 get returned. But if I do the query manually from the same pod using “dig”, I get the full list:

15:02:50.554577 IP 10.1.1.165.55134 > 10.96.0.10.53: 12043+ [1au] SRV? _http-api-port._tcp.demo-api-service.default.svc.cluster.local. (103)
15:02:50.555858 IP 10.96.0.10.53 > 10.1.1.165.55134: 12043* 7/0/7 SRV api-6.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-0.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-1.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-2.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-3.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-4.demo-api-service.default.svc.cluster.local.:8080 10 14, SRV api-5.demo-api-service.default.svc.cluster.local.:8080 10 14 (374)

So, now the question is - why HAProxy gets truncated DNS response?

Increasing “accepted_payload_size” value to 8192 does help.

I had the same problem. The documentation is not clear and i suppose that this value is 8192 by default.

The documentation states the default:

If not set, HAProxy announces 512.

Seems pretty clear to me. If you think the wording can be improved, please make a concrete suggestion.