A bit of context to start with. I have an HAProxy set up as a public facing end point for our AWS services. The HAProxy forwards requests to an internal AWS ELB (Elastic Load Balancer). As for why have HAProxy in front of an ELB, long story and off topic (ELBs don’t support percentage canaries).
The way that AWS ELBs work at a high level is they supply multiple IP addresses through DNS for a given host name. Clients would normally round robin through these. Clients DO need to resolve the hostname to an IP address fairly regularly, as the ELB will scale and move to different IP addresses as part of normal operation. We previously had issues with this and HAProxy 1.5 as the IP address would be locked in after HAProxy started. We switched to 1.6.2 and leveraged the new DNS resolvers section and this problem has gone away. This is how I know my configuration is at least talking to the DNS correctly and updating when the ELB nodes move.
Now, my problem. Through monitoring in AWS it’s evident that my HAProxy instance is still being somewhat sticky about the IP address. It seems like it routes all traffic to 1 of the 2 IP addresses that come back from the DNS. If that IP address becomes unavailable, it will gladly update and use a new valid IP address, but in the meanwhile it seems to stick to just 1 of the available IP addresses. The result is that it’s not spreading out load to the ELB nodes like we need it to.
I tried adding the “hold valid 10s” entry to my resolvers section, but this didn’t seem to fix the problem. Is there something wrong with my configuration or am I misunderstanding how resolution occurs? (current configuration below along with dig results for the hostname)
Thanks for any help or pointers!
This is just parts of the config that seem relevant. Let me know if you need more.
nameserver dns1 172.16.0.2:53
hold valid 10s
option httpchk GET /healthcheck
default-server fall 3 rise 5 inter 5s fastinter 1s
server main-lb elb.hostname:443 check resolvers vpcdns port 8080 weight 100 server canary-lb canary-elb.hostname:443 check resolvers vpcdns port 8080 weight 0
Dig results for the hostname
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 22646
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;elb.hostname. IN A
;; ANSWER SECTION:
elb.hostname. 60 IN A 172.16.1.21
elb.hostname. 60 IN A 172.16.2.70
;; Query time: 9 msec
;; SERVER: 172.16.0.2#53(172.16.0.2)
;; WHEN: Fri Feb 26 19:27:05 2016
;; MSG SIZE rcvd: 103