DNS does not resolve changed IPs

When a server’s IP changes during runtime, HAProxy does not resolve the hostname again when using external Python health check scripts. It will hold on to the old IP forever.

Here’s a timeline of events that triggers an issue for us:

  1. HAProxy starts up and resolves the name of server.com to the IP 123.123.123.123. Backend is marked as UP.
  2. The backend runs a python external health check script for the server every 10 seconds. Python resolves server.com by itself, does it’s tests against the server and keeps it UP.
  3. The server.com IP changes to 234.234.234.234.
  4. Python health checks resolve to the new IP, run their tests and everything works fine. Server kept UP.
  5. A request we want to route to server.com comes from a client to HAProxy. HAProxy still has the old 123.123.123.123 IP configured. The request is routed there and we get a 404 as our expected service is no longer there (but it’s still a valid IP with a response). The 404 is returned to the client.
  6. Python health check runs again, resolves to the new IP again and passes the checks. Server kept UP.

We haven’t found a way to force HAProxy to resolve names again at set intervals. Instead it will hold on to valid IPs until either restarted or reloaded.

Do you know any approach we could utilize here to make HAProxy to re-resolve a hostname and take the new IP into use even if the old IP is still functional?

Running our own custom health check scripts is a hard requirement.

Here are some of our related HAProxy configuration snippets. We are using HAProxy 2.9.

resolvers default
  parse-resolv-conf
  hold other           15s
  hold refused         15s
  hold nx              15s
  hold timeout         15s
  hold valid           10s
  hold obsolete        15s

backend server.com
  option external-check
  external-check command /health-check.py
  server server.com server.com:443 init-addr libc,none

Looking forward to any recommendations!

If you use haproxy resolvers, don’t allow libc resolution, because this will then hide resolver issues, like in this case:

init-addr last,none

I don’t see you refering to the resolver in the backend, so it will just to libc resolution, not using the resolver at all.

server server.com server.com:443 init-addr last,none resolvers default

You can also put those options into a default-server directive, to avoid specifying it for each server.

Thanks for the quick reply @lukastribus . I was under the assumption that a resolvers section named “default” would be “by default” enforced in the servers. Seems that was not the case.

I have now added the resolvers default line to the server line with init-addr none and do see the following in the logs:

[WARNING]  (52) : server.com/server.com changed its IP from (none) to 103.18.17.221 by default/127.0.0.11.
[WARNING]  (52) : Server server.com/server.com ('server.com') is UP/READY (resolves again).
[WARNING]  (52) : Server server.com/server.com administratively READY thanks to valid DNS answer.
[WARNING]  (52) : server.com/server.com changed its IP from (none) to 103.18.17.221 by DNS cache.
[WARNING]  (52) : Server server.com/server.com ('server.com') is UP/READY (resolves again).

However, after HAProxy is running, to test things out, I’m manually overriding this in the etc/hosts file:

127.0.0.1 server.com

After saving the file, HAProxy never updates its IP for server.com to be 127.0.0.1. It remains the original 103.18.17.221 no matter how long I wait. If I keep sending requests to HAProxy, they all land on 103.18.17.221 as well.

If I restart HAProxy I get the following line:

server.com/server.com changed its IP from (none) to 103.18.17.221 by DNS cache.

I would like the cache to expire, or some other mechanism to trigger a re-resolve. Any suggestions?

This resolver configuration means that the DNS resolvers listed as nameserver in /etc/resolv.conf will be continuously queried by haproxy.

Your local /etc/hosts is ignored (it would work with libc resolution, but with the limitation that apply - resolution at startup only).

So you cannot use /etc/hosts to test this.

Thanks @lukastribus . We figured out the /etc/hosts was not being taken into account and ran our own fake DNS server. With that we were able to change the IP resolution and noticed that HAProxy does pick up the changes.

The keys were removing the libc resolution (we initially thought it was only for startup after which HAProxy would move into the default resolvers) and adding the resolvers default line to the default-server line to actually take it into use (we thought the default resolver would be used by default).

This is correct, libc is for startup resolution.

The issue with mixing libc resolution with resolvers is that you don’t notice when something is broken with the resolver configuration during startup. You only notice because changes are not picked up.

By removing libc you make sure that either resolvers are correctly configured, or nothing would work at all.