Intermittent 504 errors and sRNN

Hi,

we experience intermittent 504 errors with session termination state sRNN.
The documentation for these states explains that the server side timeout expired while waiting for the client request.

What doesn’t fit in this explanation is the request duration which is quite short.
So the server side timeout was never reached.

TR ‘/’ Tw ‘/’ Tc ‘/’ Tr ‘/’ Ta : 0/0/0/-1/0

Haproxy access logs for these requests look like these:

 Field   Format                                        Logentry values
      1   process_name '[' pid ']:'                            haproxy[152]:
      2   client_ip ':' client_port                             10.101.0.92:41328
      3   '[' request_date ']'                      [[03/Feb/2021:08:32:53.214]
      4   frontend_name                                                public
      5   backend_name '/' server_name                             be_http:pre-prod:geo-service/pod:geo-service-2-6fgjp:geo-service:8080-tcp:172.23.1.144:8080
      6   TR '/' Tw '/' Tc '/' Tr '/' Ta*                       0/0/0/-1/0
      7   status_code                                                      504
      8   bytes_read*                                                     214
      9   captured_request_cookie                                            -
     10   captured_response_cookie                                           -
     11   termination_state                                               sRNN
     12   actconn '/' feconn '/' beconn '/' srv_conn '/' retries*    10/2/0/0/0
     13   srv_queue '/' backend_queue                                      0/0
     14   '{' captured_request_headers* '}'                   -
     15   '{' captured_response_headers* '}'                  -
     16   '"' http_request '"'                      "GET /nm/geo-service/ping HTTP/1.1"

Haproxy version 2.0.16.
Here is part of our configuration, including timeouts:

global
  maxconn 20000
  nbthread 4
  daemon
defaults
  timeout connect 5s
  timeout client 30s
  timeout client-fin 1s
  timeout server 30s
  timeout server-fin 1s
  timeout http-request 10s
  timeout http-keep-alive 300s
  timeout tunnel 1h

  no option http-use-htx

backend be_http:ci:geo-service
  mode http
  option redispatch
  option forwardfor
  balance leastconn
  timeout server  300s

  timeout check 5000ms
  http-request add-header X-Forwarded-Host %[req.hdr(host)]
  http-request add-header X-Forwarded-Port %[dst_port]
  http-request add-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  http-request add-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)]
  cookie 00d166be97fc798a8ddabe90d1f9204f insert indirect nocache httponly
  server pod:geo-service-4-zk6gs:geo-service:8080-tcp:172.20.2.127:8080 172.20.2.127:8080 cookie 6d435d8d90753b3b8464e4fb7672787b weight 256 check inter 5000ms
  server pod:geo-service-4-6s9cr:geo-service:8080-tcp:172.20.3.153:8080 172.20.3.153:8080 cookie 3eb67e6ccfedef0166bed6214b85ffcf weight 256 check inter 5000ms
  server pod:geo-service-4-v2ppb:geo-service:8080-tcp:172.23.1.98:8080 172.23.1.98:8080 cookie b9b1a3cc05726f213dd57d4ca0f57985 weight 256 check inter 5000ms

I’ve already searched the forum and found one topic with a similiar pattern: https://discourse.haproxy.org/t/intermittent-504-errors-and-sr-after-upgrade-to-1-7-10/

My understanding of that topic is that the bug causing that error has beend fixed, and that the fix should be included in all newer versions.

I’m trying to capture the error in a tcpdump, but besides from that help is needed!

Thanks!

We haven’t been able to get to the root cause of the problem, but I’ve mananged to get a dump of the tcp session, which looks perfectly normal to me. We’re kind of lost now.

The error seems to occur more often when there is less traffic, but that’s only an observation.

Maybe someone here has an idea?

My haproxy version is the same as you. I have encountered the same problem: 504 and terminationState sR. Have you solved this problem? Now I have to increase the server timeout to 5 minutes to avoid this problem. followed this link timeout - Intermittent 504 errors with HAProxy - Server Fault

Unfortunately not. In our case the occurence of the problem has sunken to a very low level since more connections are handled at the haproxy instances.

We found a pattern like this link 504 response and sR--. Before the 504 request, the previous request 's bytesread is bigger than 100000. I can’t think out why the previous request can affect the next one.

I’m a lot like you, Have you solved this problem?