I’m using haproxy on kubernetes to reverse-proxy to multiple backend services. This works well under normal circumstances, but I noticed an edge case where haproxy loses a backend and is never able to recover:
- The kubernetes dns service (in this case kube-dns, but this detail probably isn’t important) is briefly unavailable.
- One of the backend services is briefly unavailable at the same time.
- Haproxy reports the backend as unhealthy (expected): e.g., “Server grafana/instance is DOWN. 0 active and 0 backup servers left.”
- The kubernetes dns service and backend service become healthy again.
- Haproxy never reports the backend healthy again (unexpected!).
So far, I haven’t been able to find any resolver configuration that allows haproxy to recover in this scenario, but here’s my latest attempt:
resolvers kubernetes
nameserver kube-dns kube-dns.kube-system.svc.cluster.local:53
resolve_retries 100
timeout resolve 2s
timezone retry 2s
hold valid 30s
...
backend grafana
option httpchk GET /api/health
server instance grafana.svc.cluster.local:3000 check resolvers kubernetes
I’m using haproxy:2.4.20, but I’ve observed similar results with other versions. Any idea what I’m doing wrong?