I am observing really odd behavior from HAProxy - it constantly reevaluates the state of this backend and brings the servers up and down every couple of seconds. No server stays around for more than 1-2 minutes. Not to mention that I have 8 pods (all alive and well) and HAProxy sees only 5 or 6 out of them. I see the following pattern in the debug logs:
srv-ns3 changed its FQDN from (null) to api-4.dummy-api-service.default.svc.cluster.local by 'SRV record'
srv-ns4 changed its FQDN from (null) to api-4.dummy-api-service.default.svc.cluster.local by 'SRV record'
srv-ns3 is going DOWN for maintenance (No IP for server ). 5 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
....
srv-ns3 changed its IP from 10.1.1.108 to 10.1.1.109 by DNS cache.
Server dummy-api-service-nosession/srv-ns3 ('api-4.dummy-api-service.default.svc.cluster.local') is UP/READY (resolves again).
Server dummy-api-service-nosession/srv-ns3 administratively READY thanks to valid DNS answer.
dummy-api-service-nosession/srv-ns3 changed its IP from 10.1.1.108 to 10.1.1.109 by DNS cache.
...
I can assure you that the pods themselves are alive and well. In fact, I have attempted an alternativel configuration using just 8 “server” lines - all 8 are green 100% and never go down.
There is something odd about it. I have noticed that the order of the SRV records constantly changes in K8S - but this is expected, the order is not guaranteed in DNS anyway.
P.S. Attempted the same configuration with version 1.9 - same result. When using service-template, the list of available servers constantly changes, they go up and down and some of them never get to UP state.
Some additional details. I have done some experiments changing the size of the upstream pool, e.g. number of running API pods in K8S. It appears that if I have fewer than 6 pods, HAProxy behaves correctly - the list of upstreams is stable, nothing goes down. Once I start the 7th pod, things go nuts and the servers start going up and down in HAProxy randomly.
Here is my guess. It must have something to do with the size of the DNS response. Maybe once the response gets larger than certain amount (and most likely split into multiple UDP datagrams), HAProxy fails to read it all and, given the fact that DNS always reshuffles the list of records, HAProxy gets random partial results every time it performs the lookup? This is just a guess.
Another update. I think I was partially right with my previous guess. I did the same test going from 6 to 7 pods and doing tcpdump on the DNS traffic. Here is the DNS request done by HAproxy: