I have a very strange connection issue, trying to proxy redis servers.
First, the relevant configuration extract:
defaults mode http log global option httplog option dontlognull option dontlog-normal option http-ignore-probes timeout connect 5000 timeout client 50000 timeout server 50000 maxconn 25000 frontend redis bind *:6362 mode tcp use_backend redis timeout client 3h option srvtcpka backend redis mode tcp timeout server 3h option clitcpka server cache01 cache01:6381 check
Under heavy load, connections are randomly dropped, but limits are not hit when I check haproxy stats (~50 open connections).
(To add extra details, it’s a celery ‘client’ instance, using redis as a result backend, failing to submit new tasks).
HaProxy logs show this:
Jan 28 11:40:30 localhost haproxy: 127.0.0.1:37852 [28/Jan/2019:11:08:59.155] redis redis/cache01 1/0/1891613 2421222 SD 133/98/97/97/0 0/0
Jan 28 11:40:41 localhost haproxy: 127.0.0.1:43558 [28/Jan/2019:11:40:30.805] redis redis/cache01 1/0/10482 2605056 CD 133/98/97/97/0 0/0
Extra remark: it seems the double disconnect always together.
Extra details: the client application receive a
There is no issues with a direct connection: this only happens with HaProxy in between (but we need it ^^).
The issue happens on different system / linux kernls: We can reproduce it locally on a single “basic” archlonux machine or in debian stable containers on a “powerfull” server.
Having no idea who was responsible (Celery <-> Ha or Ha <-> Redis), we started to dump networks packets:
For the first connection (37852, closed at 11:40:30), we have the following timeline:
11.08.59 celery -> ha Connection opened. ha -> redis opened a few milliseconds later
11:09:00 ha -> celery: At tcp level, the window seem to be full (On celery side)
11:09:00 celery -> ha: Celery is starting to send tcp packets with a window of zero
11:09:10 After a few keep-alive from ha to celery, celery start again to send data, but window is still set to 0. No data traffic from ha to celery anymore in the full dump.
11:11:05 redis -> ha: At tcp level, window seems to be full (On haproxy side)
11:11:05 ha -> redis: HaProxy is starting to send tcp packets with a window of zero
11:11:11 After a few keep-alive, haproxy start again to send data, but window is still set to 0. No data traffic form redis to celery will be seen again.
11:21:17 redis -> ha: At tcp level, window seems to be full (On redis side)
11:21:17 redis start to send tcp packets with a window of zero as well
11:22:35 redis close the connection
11:36:45: ha -> redis: At tcp level, window seems to be full (On haproxy side). At thing point no more traffic is exchanged
11:38:50 Last tcp ack
11:40:30: Celery closed the connection
Remark 1: There is data “lost” in HaProxy buffer: the traffic redis -> ha -> celery is not send anymore at 11:09:00, and redis -> ha is stopped at 11:11:05 (so there is ~2 minutes of traffic in haproxy buffers).
Remark 2: Why redis didn’t closed the front connection when the back connection was lost ? I’m 99% sure that I see the two connections in tcp dump (correct timing, same data when everything was working)
For the second connection, the issue is quite similar:
11:40:30: Connection opened
11:40:33: Last message from HA to celery (window full), but no pause in data from celery to ha
11.40:41: Celery close the connection
11:40:41: Ha close the connection to redis
This second case make more sense (no pause in data flow, front is closed to back is closed immediately).
Some screenshot of tcpdumps in case I missed something:
Backend Haproxy full
Redis full and clsoe
Frontend Haproxy full and clsoe
I don’t understand why this is happening only with haproxy in between. There is load on the system, but it’s seems unlikely that the celery client is unable to find time to consume his buffer for ~30 minutes, especially when he find time to send data.
This cloud be a client (redis-py is used by celery) issue, but I was wondering if it could be a haproxy configuration issue and/or someone have any hits to try to fix the issue.
Thanks for your help!