Hi,
We stumble on some intermittent 504 responses and we identified a scenario.
Each time, a slow client (mobile) made a >100ko request. Backend response is rather quick ~200ms, but the total time is ~30s.
Example:
Nov 20 16:51:32 cm-prod-haproxy-1-dc2 haproxy[14226]: [the ip]:56118 [20/Nov/2019:16:51:01.864] https-in~ www-xxx/cz-prod-web-3-dc2.xxx 0/0/1/200/30327 200 398636 - - ---- 464/464/40/13/0 0/0 "GET /api/1/android/es/xxx HTTP/1.1"
Then, the next request when done within a few seconds returns a 504 with this weird sR--
state:
Example:
Nov 20 16:51:37 cm-prod-haproxy-1-dc2 haproxy[14226]: [the ip]:56118 [20/Nov/2019:16:51:37.957] https-in~ www-xxx/cz-prod-web-2-dc2.xxx 0/0/0/-1/0 504 214 - - sR-- 366/366/43/12/0 0/0 "GET /api/1/android/es/yyy HTTP/1.1"
On our backend side (nginx), the connection is cut by HAProxy (499 code) :
[the ip] - - [20/Nov/2019:16:51:37 +0100] "GET /api/1/android/es/yyy HTTP/1.1" 499 0 "-" "Xxx/5.24.0 (Android; 28 9)" "[the ip]"
And on our backend side still (Rails server, unicorn), the request is fully made.
But, is the next request is >40s apart from the first one, everything is fine.
This is really weird. As if something on the first request impacted the second one, a moment after.
I saw this thread, but it is deemed to be fixed (1.7.x in March) : Intermittent 504 errors and sR-- after upgrade to 1.7.10
Our conf is:
defaults
log global
maxconn 8000
mode http
retries 3
timeout client 10s
timeout connect 5s
timeout server 30s
option httplog
option redispatch
option http-buffer-request
balance roundrobin
no option http-use-htx
And for nginx
keepalive_timeout 650;
keepalive_requests 10000;