Check duration burst to 1000 (ms)

we use haproxy before redis,

listen 22001
bind :22001
balance roundrobin
maxconn 16384
server 1 10.18.8.14:1041 check

which enables check.

mostly the check works fine, but sometimes it burst to 1000ms (or a few ms plus);
to worsen it, clients through haproxy see the same 1000ms hang;

if the clients connect directly to redis, they see no hang at all.

we have looked up the haproxy manual, suspecting if there is a timeout default to 1s;
but no satisfactory answers.

Any idea why there is 1000 check duration?

Your random latency has nothing to do to with the checks subsystem. You probably have some CPU or network issue, which you will have to troubleshoot.

For example, this VM could be preempted by the virtualization, or you could have some other chokepoint that randomly introduce the latency.

Thanks!

We are not running our instances in VM, we are just baremetal.
It would be surprising if there are network / cpu chokepoints, since we deploy our redis and haproxy in a mixed fashion, and no (claimed) redis is suffering the latency.

We will get into the CPU / network though. Thanks.

You can tcpdump or strace -tt it, if you want certainty. But this will require you to go through a lot of data.

No root cause found yet, but we are seeing that:

  1. centos7.5 is immune to this issue and
  2. not caused by kernel version

we are still confirming.

We have tried official dockerized haproxy, issue remains.

Similar to tcp health checking, frontend clients also suffer the same 1s latency;
And only at the connection establishing stage is there 1s latency.

You have already confirmed above that CentOS 7.5 does not have this issue. So this is unrelated to haproxy.

If you want troubleshooting advise, I suggest you start explaining your environment - and first of all - CentOs 7.5 is immune to this problem as opposed to what? CentOS 6?

Sorry, please forget my previous posts.

After tcpdumping, we found that the 1 second delay is introduced by first SYN retry.

To wit,
The backend server (redis) close the connection, and enters into TIME-WAIT;
(a minute later…)
Haproxy sends SYN, connecting to the same redis using the same local random port;
Redis, while in TIME-WAIT, re-send the ACK;
Haproxy sees the ACK with wrong seq, reply with a REST;
(a second later…)
Haproxy retry SYN and everything goes well since.