Check duration burst to 1000 (ms)

liuzhenasdfa · June 4, 2019, 8:13am

we use haproxy before redis,

listen 22001
bind :22001
balance roundrobin
maxconn 16384
server 1 10.18.8.14:1041 check

which enables check.

mostly the check works fine, but sometimes it burst to 1000ms (or a few ms plus);
to worsen it, clients through haproxy see the same 1000ms hang;

if the clients connect directly to redis, they see no hang at all.

we have looked up the haproxy manual, suspecting if there is a timeout default to 1s;
but no satisfactory answers.

Any idea why there is 1000 check duration?

lukastribus · June 5, 2019, 8:32am

Your random latency has nothing to do to with the checks subsystem. You probably have some CPU or network issue, which you will have to troubleshoot.

For example, this VM could be preempted by the virtualization, or you could have some other chokepoint that randomly introduce the latency.

liuzhenasdfa · June 5, 2019, 8:50am

Thanks!

We are not running our instances in VM, we are just baremetal.
It would be surprising if there are network / cpu chokepoints, since we deploy our redis and haproxy in a mixed fashion, and no (claimed) redis is suffering the latency.

We will get into the CPU / network though. Thanks.

lukastribus · June 5, 2019, 10:12am

You can tcpdump or strace -tt it, if you want certainty. But this will require you to go through a lot of data.

liuzhenasdfa · July 22, 2019, 7:12am

No root cause found yet, but we are seeing that:

centos7.5 is immune to this issue and
not caused by kernel version

we are still confirming.

liuzhenasdfa · October 17, 2019, 9:54am

We have tried official dockerized haproxy, issue remains.

Similar to tcp health checking, frontend clients also suffer the same 1s latency;
And only at the connection establishing stage is there 1s latency.

lukastribus · October 17, 2019, 3:54pm

You have already confirmed above that CentOS 7.5 does not have this issue. So this is unrelated to haproxy.

If you want troubleshooting advise, I suggest you start explaining your environment - and first of all - CentOs 7.5 is immune to this problem as opposed to what? CentOS 6?

liuzhenasdfa · October 18, 2019, 1:15pm

Sorry, please forget my previous posts.

After tcpdumping, we found that the 1 second delay is introduced by first SYN retry.

To wit,
The backend server (redis) close the connection, and enters into TIME-WAIT;
(a minute later…)
Haproxy sends SYN, connecting to the same redis using the same local random port;
Redis, while in TIME-WAIT, re-send the ACK;
Haproxy sees the ACK with wrong seq, reply with a REST;
(a second later…)
Haproxy retry SYN and everything goes well since.

Topic		Replies	Views
Strange connection issues with tcp (redis) under load Help!	2	3573	February 8, 2019
Strange timeout behavior during load testing Help!	7	1678	October 5, 2017
Tcp-check leaves lots of TIME_WAITs Help!	31	5466	November 26, 2019
Resurrecting backend servers with health checks Help!	1	255	January 12, 2023
Server goes UP without tcp-check if it resolves again Help!	1	1648	February 28, 2019

Check duration burst to 1000 (ms)

Related topics