Hi,
we experience intermittent 504 errors with session termination state sRNN.
The documentation for these states explains that the server side timeout expired while waiting for the client request.
What doesn’t fit in this explanation is the request duration which is quite short.
So the server side timeout was never reached.
TR ‘/’ Tw ‘/’ Tc ‘/’ Tr ‘/’ Ta : 0/0/0/-1/0
Haproxy access logs for these requests look like these:
Field Format Logentry values
1 process_name '[' pid ']:' haproxy[152]:
2 client_ip ':' client_port 10.101.0.92:41328
3 '[' request_date ']' [[03/Feb/2021:08:32:53.214]
4 frontend_name public
5 backend_name '/' server_name be_http:pre-prod:geo-service/pod:geo-service-2-6fgjp:geo-service:8080-tcp:172.23.1.144:8080
6 TR '/' Tw '/' Tc '/' Tr '/' Ta* 0/0/0/-1/0
7 status_code 504
8 bytes_read* 214
9 captured_request_cookie -
10 captured_response_cookie -
11 termination_state sRNN
12 actconn '/' feconn '/' beconn '/' srv_conn '/' retries* 10/2/0/0/0
13 srv_queue '/' backend_queue 0/0
14 '{' captured_request_headers* '}' -
15 '{' captured_response_headers* '}' -
16 '"' http_request '"' "GET /nm/geo-service/ping HTTP/1.1"
Haproxy version 2.0.16.
Here is part of our configuration, including timeouts:
global
maxconn 20000
nbthread 4
daemon
defaults
timeout connect 5s
timeout client 30s
timeout client-fin 1s
timeout server 30s
timeout server-fin 1s
timeout http-request 10s
timeout http-keep-alive 300s
timeout tunnel 1h
no option http-use-htx
backend be_http:ci:geo-service
mode http
option redispatch
option forwardfor
balance leastconn
timeout server 300s
timeout check 5000ms
http-request add-header X-Forwarded-Host %[req.hdr(host)]
http-request add-header X-Forwarded-Port %[dst_port]
http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
http-request add-header X-Forwarded-Proto https if { ssl_fc }
http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
cookie 00d166be97fc798a8ddabe90d1f9204f insert indirect nocache httponly
server pod:geo-service-4-zk6gs:geo-service:8080-tcp:172.20.2.127:8080 172.20.2.127:8080 cookie 6d435d8d90753b3b8464e4fb7672787b weight 256 check inter 5000ms
server pod:geo-service-4-6s9cr:geo-service:8080-tcp:172.20.3.153:8080 172.20.3.153:8080 cookie 3eb67e6ccfedef0166bed6214b85ffcf weight 256 check inter 5000ms
server pod:geo-service-4-v2ppb:geo-service:8080-tcp:172.23.1.98:8080 172.23.1.98:8080 cookie b9b1a3cc05726f213dd57d4ca0f57985 weight 256 check inter 5000ms
I’ve already searched the forum and found one topic with a similiar pattern: https://discourse.haproxy.org/t/intermittent-504-errors-and-sr-after-upgrade-to-1-7-10/
My understanding of that topic is that the bug causing that error has beend fixed, and that the fix should be included in all newer versions.
I’m trying to capture the error in a tcpdump, but besides from that help is needed!
Thanks!