We have Kubernetes clusters running in Google Cloud that are using HAProxy as a reverse proxy, balancing to headless services. These services establish DNS SRV records for HAProxy to use for service discovery. We have externalTrafficPolicy set to Local in the internet facing service, so that the original source IP gets passed to HAProxy. Here’s an example of the balancing config for one of our backends and my resolver config:
backend tao-backend
balance roundrobin
stick-table type ip size 1000k expire 14400m
stick on src
server-template tao 1 srv-tao.default.svc.cluster.local:6543 resolvers dns check
resolvers dns
nameserver dns1 10.47.240.10:53
Yesterday, I found documentation that indicated I needed to use the SRV records that begin with an underscore, but I also found a release note that says HAProxy will now also work if multiple A records are returned. I can tell from the HAProxy node that both the SRV and A record methods are returning IPs for each of the relevant pods, which indicates that Kubernetes considers both to be healthy. However, HAProxy in prod is sending traffic to only a single IP.
Here are the DNS records:
root@haproxy-867ddf67c5-7s7l7:~# dig +noall +answer SRV srv-tao.default.svc.cluster.local
srv-tao.default.svc.cluster.local. 30 IN SRV 10 50 0 3239353830366139.srv-tao.default.svc.cluster.local.
srv-tao.default.svc.cluster.local. 30 IN SRV 10 50 0 3336303363316365.srv-tao.default.svc.cluster.local.
root@haproxy-867ddf67c5-7s7l7:~# dig +noall +answer A srv-tao.default.svc.cluster.local
srv-tao.default.svc.cluster.local. 8 IN A 10.44.1.2
srv-tao.default.svc.cluster.local. 8 IN A 10.44.4.86
I know I could also use the SRV records of the form _service._proto.name., but it’s working fine in my dev environment without using that.
I’ve also enabled the stats module to try to get some idea of why only a single pod keeps getting all the traffic. With the following configuration, it’s not giving me anything useful.
backend stats
stats enable
stats auth admin:********
stats admin if TRUE
stats realm HAProxy\ Statistics
stats refresh 5s
stats show-desc
stats show-legends
From the HAProxy logs, I can see that all the requests going to HAProxy have the original source IP rather than the internet facing service, so I’ve ruled out all the src IPs being the same for stickiness.
Has anyone seen an issue like this before, or can anyone point me in the direction of how to determine exactly why HAProxy would use only one IP when DNS is returning 2?
Thanks!