Intermittent 504 errors and sRNN

nm-devops · February 17, 2021, 5:10pm

Hi,

we experience intermittent 504 errors with session termination state sRNN.
The documentation for these states explains that the server side timeout expired while waiting for the client request.

What doesn’t fit in this explanation is the request duration which is quite short.
So the server side timeout was never reached.

TR ‘/’ Tw ‘/’ Tc ‘/’ Tr ‘/’ Ta : 0/0/0/-1/0

Haproxy access logs for these requests look like these:

 Field   Format                                        Logentry values
      1   process_name '[' pid ']:'                            haproxy[152]:
      2   client_ip ':' client_port                             10.101.0.92:41328
      3   '[' request_date ']'                      [[03/Feb/2021:08:32:53.214]
      4   frontend_name                                                public
      5   backend_name '/' server_name                             be_http:pre-prod:geo-service/pod:geo-service-2-6fgjp:geo-service:8080-tcp:172.23.1.144:8080
      6   TR '/' Tw '/' Tc '/' Tr '/' Ta*                       0/0/0/-1/0
      7   status_code                                                      504
      8   bytes_read*                                                     214
      9   captured_request_cookie                                            -
     10   captured_response_cookie                                           -
     11   termination_state                                               sRNN
     12   actconn '/' feconn '/' beconn '/' srv_conn '/' retries*    10/2/0/0/0
     13   srv_queue '/' backend_queue                                      0/0
     14   '{' captured_request_headers* '}'                   -
     15   '{' captured_response_headers* '}'                  -
     16   '"' http_request '"'                      "GET /nm/geo-service/ping HTTP/1.1"

Haproxy version 2.0.16.
Here is part of our configuration, including timeouts:

global
  maxconn 20000
  nbthread 4
  daemon
defaults
  timeout connect 5s
  timeout client 30s
  timeout client-fin 1s
  timeout server 30s
  timeout server-fin 1s
  timeout http-request 10s
  timeout http-keep-alive 300s
  timeout tunnel 1h

  no option http-use-htx

backend be_http:ci:geo-service
  mode http
  option redispatch
  option forwardfor
  balance leastconn
  timeout server  300s

  timeout check 5000ms
  http-request add-header X-Forwarded-Host %[req.hdr(host)]
  http-request add-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
  cookie 00d166be97fc798a8ddabe90d1f9204f insert indirect nocache httponly
  server pod:geo-service-4-zk6gs:geo-service:8080-tcp:172.20.2.127:8080 172.20.2.127:8080 cookie 6d435d8d90753b3b8464e4fb7672787b weight 256 check inter 5000ms
  server pod:geo-service-4-6s9cr:geo-service:8080-tcp:172.20.3.153:8080 172.20.3.153:8080 cookie 3eb67e6ccfedef0166bed6214b85ffcf weight 256 check inter 5000ms
  server pod:geo-service-4-v2ppb:geo-service:8080-tcp:172.23.1.98:8080 172.23.1.98:8080 cookie b9b1a3cc05726f213dd57d4ca0f57985 weight 256 check inter 5000ms

I’ve already searched the forum and found one topic with a similiar pattern: https://discourse.haproxy.org/t/intermittent-504-errors-and-sr-after-upgrade-to-1-7-10/

My understanding of that topic is that the bug causing that error has beend fixed, and that the fix should be included in all newer versions.

I’m trying to capture the error in a tcpdump, but besides from that help is needed!

Thanks!

nm-devops · March 3, 2021, 2:28pm

We haven’t been able to get to the root cause of the problem, but I’ve mananged to get a dump of the tcp session, which looks perfectly normal to me. We’re kind of lost now.

The error seems to occur more often when there is less traffic, but that’s only an observation.

Maybe someone here has an idea?

sean · July 5, 2021, 5:51am

My haproxy version is the same as you. I have encountered the same problem: 504 and terminationState sR. Have you solved this problem? Now I have to increase the server timeout to 5 minutes to avoid this problem. followed this link timeout - Intermittent 504 errors with HAProxy - Server Fault

nm-devops · July 5, 2021, 7:59am

Unfortunately not. In our case the occurence of the problem has sunken to a very low level since more connections are handled at the haproxy instances.

sean · July 5, 2021, 8:28am

We found a pattern like this link 504 response and sR--. Before the 504 request, the previous request 's bytesread is bigger than 100000. I can’t think out why the previous request can affect the next one.

Zecr · March 12, 2022, 7:55am

I’m a lot like you， Have you solved this problem?

Topic		Replies	Views
504 response and sR-- Help!	0	684	November 20, 2019
Intermittent 504 errors and sR-- after upgrade to 1.7.10 Help!	28	12685	March 12, 2022
504 with a super slow backend Help!	1	1734	April 25, 2016
Can an 504 Error be thrown by HAProxy itself? Help!	3	3840	February 5, 2023
Haproxy random and rare 503 and 504 error Help!	2	2019	October 15, 2019

Intermittent 504 errors and sRNN

Related topics