Recover from overlapping dns, backend unavailability

I’m using haproxy on kubernetes to reverse-proxy to multiple backend services. This works well under normal circumstances, but I noticed an edge case where haproxy loses a backend and is never able to recover:

  • The kubernetes dns service (in this case kube-dns, but this detail probably isn’t important) is briefly unavailable.
  • One of the backend services is briefly unavailable at the same time.
  • Haproxy reports the backend as unhealthy (expected): e.g., “Server grafana/instance is DOWN. 0 active and 0 backup servers left.”
  • The kubernetes dns service and backend service become healthy again.
  • Haproxy never reports the backend healthy again (unexpected!).

So far, I haven’t been able to find any resolver configuration that allows haproxy to recover in this scenario, but here’s my latest attempt:

resolvers kubernetes
  nameserver kube-dns kube-dns.kube-system.svc.cluster.local:53
  resolve_retries 100
  timeout resolve 2s
  timezone retry 2s
  hold valid 30s
...
backend grafana
  option httpchk GET /api/health
  server instance grafana.svc.cluster.local:3000 check resolvers kubernetes

I’m using haproxy:2.4.20, but I’ve observed similar results with other versions. Any idea what I’m doing wrong?