Hoping you can help. I’m seeing an issue with 1.8.14 and also 1.8.16 DNS service discovery where HAProxy no longer picks up changes from DNS. I have a server-template with a single slot and point that at DNS. Initially things work but randomly as server-state-file reconfigs happen and DNS gets updated with new ports, the backend gets stuck on the previous no longer existing host/port combination. We have multiple servers configured the same way and they randomly get stuck like this.
For example a DNS entry for _testapp_http._tcp.marathon.mesos would point to localhost:24379 at one point in time then that service would go away and re-recreated on localhost:13903 and DNS updated. Most of the time HAProxy picks up the change but occasionally it will stick forever on the old localhost:24379.
A tcpdump of DNS shows the correct new entry being returned:
1 9:36:37.401865 IP localhost.39124 > localhost.domain: 33907+ [1au] SRV? _testapp_http._tcp.marathon.mesos. (63)
19:36:37.402016 IP localhost.domain > localhost.39124: 33907* 1/0/2 SRV localslave.marathon.mesos.:13903 20 25344 (124)
The ‘show stat’ CLI shows the old port:
health_testapp,testapp_health1,0,0,0,0,64,0,0,0,0,0,0,0,0,DOWN,100,1,0,0,1,138,138,1,8,1,0,2,0,0,L4CON,0,0,0,0,0,0,0,0,0,-1,Connection refused,0,0,0,0,Layer4 connection problem,3,2,0,127.0.0.1:24379,http,
health_testapp,BACKEND,0,0,0,1,200,70,4760,14840,0,0,70,0,0,0,DOWN,0,0,0,1,138,138,1,8,0,0,1,1,1,0,0,0,0,70,0,70,0,0,0,0,0,0,-1,0,0,0,0,http,roundrobin,
The server-state-file shows the old port:
8 health_testapp 1 testapp_health1 127.0.0.1 0 0 100 1 201 8 2 0 6 0 0 0 localslave.marathon.mesos 24379 _testapp_http._tcp.marathon.mesos
server-template config is:
server-template testapp_health 1 _testapp_http._tcp.marathon.mesos resolvers localdns resolve-prefer ipv4 maxconn 64 rise 3 fall 2 check inter 10000
I tried using ‘resolve-opts allow-dup-ip’ but it didn’t help. It seems like that slot is permanently stuck for some reason? Some race between the server-state-file reload and DNS updates?
Any workaround or fix would be appreciated.
Thanks,
Steve