Getting too many health checks towards backend

Hi,

I noticed in the HAproxy logs that the number of health checks (or connections) towards the configured backend server are way more often than configured. Any ideas why? Or how to debug more?

I’m using haproxytech/haproxy-alpine:2.3.9 as a docker container in K8s (AWS EKS) with an AWS load balancer in front of the service.

Here’s the HAproxy config:

  global
    log stdout format raw local0
    maxconn 1024

  defaults
    log global
    timeout client 60s
    timeout connect 60s
    timeout server 60s

  frontend k8s-api
    bind :443
    mode tcp
    option tcplog
    default_backend k8s-api

  backend k8s-api
    mode tcp
    option tcplog
    option tcp-check
    balance roundrobin
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
    server k8snode <server_ip>:6443 check

Here’s the HAproxy logs:

10.100.7.32:25796 [10/May/2021:21:57:36.135] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:15949 [10/May/2021:21:57:36.698] k8s-api k8s-api/k8snode 1/-1/0 0 CC 1/1/0/0/0 0/0
10.100.7.32:44005 [10/May/2021:21:57:36.712] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:62517 [10/May/2021:21:57:36.761] k8s-api k8s-api/k8snode 1/-1/0 0 CC 1/1/0/0/0 0/0
10.100.7.32:47633 [10/May/2021:21:57:37.396] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:31109 [10/May/2021:21:57:37.463] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:20072 [10/May/2021:21:57:37.722] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:18167 [10/May/2021:21:57:38.376] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:31544 [10/May/2021:21:57:38.413] k8s-api k8s-api/k8snode 1/-1/0 0 CC 1/1/0/0/0 0/0
10.100.7.32:36069 [10/May/2021:21:57:39.967] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:49191 [10/May/2021:21:57:41.556] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:61907 [10/May/2021:21:57:42.063] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:27815 [10/May/2021:21:57:42.819] k8s-api k8s-api/k8snode 1/-1/0 0 CC 1/1/0/0/0 0/0
10.100.7.32:19050 [10/May/2021:21:57:44.206] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:21491 [10/May/2021:21:57:45.336] k8s-api k8s-api/k8snode 1/-1/0 0 CC 1/1/0/0/0 0/0
10.100.7.32:19926 [10/May/2021:21:57:45.888] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:35651 [10/May/2021:21:57:46.118] k8s-api k8s-api/k8snode 1/-1/0 0 CC 1/1/0/0/0 0/0
10.100.7.32:39451 [10/May/2021:21:57:46.951] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:34007 [10/May/2021:21:57:46.989] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:29739 [10/May/2021:21:57:47.038] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:33970 [10/May/2021:21:57:47.626] k8s-api k8s-api/k8snode 1/-1/0 0 CC 1/1/0/0/0 0/0
10.100.7.32:11754 [10/May/2021:21:57:47.756] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:1275 [10/May/2021:21:57:47.949] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:26032 [10/May/2021:21:57:48.513] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:53698 [10/May/2021:21:57:48.646] k8s-api k8s-api/k8snode 1/-1/0 0 -- 1/1/0/0/0 0/0
10.100.7.32:51007 [10/May/2021:21:57:50.148] k8s-api k8s-api/k8snode 1/-1/0 0 CC 1/1/0/0/0 0/0
10.100.7.32:37349 [10/May/2021:21:57:51.787] k8s-api k8s-api/k8snode 1/-1/0 0 CC 1/1/0/0/0 0/0

And here’s how it looks in a TCP capture on the backend server

You can notice that there are more than 3 connections per second, on average, that are trying to be established towards the backend server. Why? I don’t know, I assume they are health checks since they get reset (RST) right after getting a SYN,ACK response.

What I’m trying to do is to create a HAproxy deployment in EKS, based on the HAproxy helm chart which will act as a front for kube-apiserver that’s running on my on-premise Kubernetes cluster.

Thanks!

To answer my own question:

This behavior seems to be generated by the fact that the HAProxy is behind an AWS NLB (Network Load Balancer) which sends more health checks than one would think (3 or more per second). Related issue here: https://forums.aws.amazon.com/message.jspa?messageID=954992

I’ve removed the HAProxy instances from the NLB and then I can see that the health checks are sent at the configured intervals. So, it’s not an HAProxy issue, it’s an AWS NLB issue.

One way to resolve this is to set up a separate monitor-uri frontend for NLB healthchecks only:

  frontend health_check
    bind :80
    mode http
    monitor-uri /check
    acl k8sapi_toobusy avg_queue(k8s-api) gt 20
    monitor fail if k8sapi_toobusy

  frontend k8s-api
    bind :443
    mode tcp
    option tcplog
    option logasap
    default_backend k8s-api

  backend k8s-api
    mode tcp
    option tcplog
    option tcp-check
    balance first
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100
    server A 1.2.3.4:6443 check
    server B 1.2.3.5:6443 check