Recover from overlapping dns, backend unavailability

jmcarp · December 10, 2022, 11:29pm

I’m using haproxy on kubernetes to reverse-proxy to multiple backend services. This works well under normal circumstances, but I noticed an edge case where haproxy loses a backend and is never able to recover:

The kubernetes dns service (in this case kube-dns, but this detail probably isn’t important) is briefly unavailable.
One of the backend services is briefly unavailable at the same time.
Haproxy reports the backend as unhealthy (expected): e.g., “Server grafana/instance is DOWN. 0 active and 0 backup servers left.”
The kubernetes dns service and backend service become healthy again.
Haproxy never reports the backend healthy again (unexpected!).

So far, I haven’t been able to find any resolver configuration that allows haproxy to recover in this scenario, but here’s my latest attempt:

resolvers kubernetes
  nameserver kube-dns kube-dns.kube-system.svc.cluster.local:53
  resolve_retries 100
  timeout resolve 2s
  timezone retry 2s
  hold valid 30s
...
backend grafana
  option httpchk GET /api/health
  server instance grafana.svc.cluster.local:3000 check resolvers kubernetes

I’m using haproxy:2.4.20, but I’ve observed similar results with other versions. Any idea what I’m doing wrong?

Topic		Replies	Views
Resolvers inside kubernetes Help!	11	5474	July 7, 2017
Backend goes into MAINT when DNS fails and does not come back UP Help!	2	2517	May 4, 2020
Resurrecting backend servers with health checks Help!	1	256	January 12, 2023
Understnd DNS resolver behaviour Configuration Samples	0	447	March 15, 2023
Issues proxying in K8s Help!	4	1484	June 25, 2019

Recover from overlapping dns, backend unavailability

Related topics