502 bad gateway errors after migrating from haproxy v1.8 to v3.0.3

The haproxy configuration is exactly the same as we copied it from the v1.8 box.
Post migration, we are seeing a lot of 502 bad gateway errors in our analytics tool (datadog). Strangely, we are not seeing any 502 in the haproxy logs in itself.

The v1.8 box was running Centos 7.9 and the v3.0.3 is running RL 8.10. Everything else (CPU, memory, configuration etc. everything is the same)

haproxy -vv output:

HAProxy version 3.0.3-95a607c 2024/07/11 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2029.
Known bugs: http://www.haproxy.org/bugs/bugs-3.0.3.html
Running on: Linux 4.18.0-553.44.1.el8_10.x86_64 #1 SMP Mon Mar 10 11:32:40 UTC 2025 x86_64
Build options :
  TARGET  = linux-glibc
  CC      = cc
  CFLAGS  = -O2 -g -fwrapv
  OPTIONS = USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1
  DEBUG   =

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H -DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC +LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING +NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT +PCRE -PCRE2 -PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL -PROMEX -PTHREAD_EMULATION -QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL -STATIC_PCRE -STATIC_PCRE2 +SYSTEMD +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_TGROUPS=16, MAX_THREADS=256, default=8).
Built with OpenSSL version : OpenSSL 1.1.1k  FIPS 25 Mar 2021
Running on OpenSSL version : OpenSSL 1.1.1k  FIPS 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.5
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.42 2018-03-20
Running on PCRE version : 8.42 2018-03-20
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with gcc compiler version 8.5.0 20210514 (Red Hat 8.5.0-20)

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : none

Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

We are using haproxy only for load balancing and not for SSL termination.
The only anomaly I am seeing in the logs is as below, however I am not sure if this has anything to do with the issue at hand:

Jun 13 04:12:15 localhost haproxy[529908]: xx.xx.xx.xx:62134 [13/Jun/2025:04:12:15.839] FE01 xxxx/app02_8022 0/6/-1/-1/8 503 217 - - SCNN 55/55/0/0/+3 0/0 {xxxx.com} "POST /apps/PortalController HTTP/1.1"
Jun 13 04:12:16 localhost haproxy[529908]: xx.xx.xx.xx:53429 [13/Jun/2025:04:12:16.383] FE01 xxxx/app02_8024 0/4/-1/-1/5 503 217 - - SCNN 54/54/0/0/+3 0/0 {xxxx.com} "POST /apps/PortalController HTTP/1.1"
Jun 13 04:31:37 localhost haproxy[720655]: xx.xx.xx.xx:60610 [13/Jun/2025:04:31:37.129] FE01 xxxx/app02_8022 0/4/-1/-1/6 503 217 - - SCNN 47/47/0/0/+3 0/0 {xxxx.com} "POST /apps/PortalController HTTP/1.1"
Jun 13 04:31:37 localhost haproxy[720655]: xx.xx.xx.xx:31864 [13/Jun/2025:04:31:37.267] FE01 xxxx/app02_8024 0/4/-1/-1/6 503 217 - - SCNN 49/49/0/0/+3 0/0 {xxxx.com} "POST /apps/PortalController HTTP/1.1"
Jun 13 04:36:24 localhost haproxy[530912]: xx.xx.xx.xx:30526 [13/Jun/2025:04:36:24.319] FE01 xxxx/app02_8022 0/5/-1/-1/6 503 217 - - SCNN 49/49/0/0/+3 0/0 {xxxx.com} "POST /apps/PortalController HTTP/1.1"
Jun 13 04:36:24 localhost haproxy[530912]: xx.xx.xx.xx:58905 [13/Jun/2025:04:36:24.555] FE01 xxxx/app02_8024 0/4/-1/-1/5 503 217 - - SCNN 50/50/0/0/+3 0/0 {xxxx.com} "POST /apps/PortalController HTTP/1.1"
Jun 13 04:48:00 localhost haproxy[534050]: xx.xx.xx.xx:38169 [13/Jun/2025:04:48:00.790] FE01 xxxx/app02_8022 0/3/-1/-1/5 503 217 - - SCNN 54/54/0/0/+3 0/0 {xxxx.com} "POST /apps/PortalController HTTP/1.1"

For some reason, we are seeing a latency increase of 10% in the URLs that are hosted in the new box. Please help and let me know if any more information is needed.

Did a little more digging and found the 502 errors in our frontend whenever the below is being logged in haproxy:

Jun 10 11:43:19 localhost haproxy[2130]: xx.xx.xx.192:44206 [10/Jun/2025:11:42:59.695] FE01 msg/app02 0/0/0/6/20073 302 700 - - SD-- 13/13/1/0/0 0/0 {xxx.fm} "HEAD /cd5da0cff8004db381bed4e84d4e3fa1 HTTP/1.1"
Jun 10 11:43:41 localhost haproxy[2130]: xx.xx.xx.192:20376 [10/Jun/2025:11:43:19.657] FE01 msg/app03 0/0/0/2/21491 302 676 - - SD-- 17/17/2/0/0 0/0 {xxx.fm} "HEAD /313b8c7c5e5445a489cf6ffef9ac4b53 HTTP/1.1"
Jun 10 11:43:49 localhost haproxy[2130]: xx.xx.xx.192:25606 [10/Jun/2025:11:43:27.863] FE01 msg/app05 0/0/0/3/21552 302 2398 - - SD-- 16/16/1/0/0 0/0 {xxx.fm} "HEAD /e80fe95c32ac4d62a3b4517de74f0790 HTTP/1.1"
Jun 10 11:43:59 localhost haproxy[2130]: xx.xx.xx.192:3429 [10/Jun/2025:11:43:38.923] FE01 msg/app07 0/0/0/3/20986 302 410 - - SD-- 14/14/1/0/0 0/0 {xxx.fm} "HEAD /c72b613b945942b2948b30850e1ac6f5 HTTP/1.1"
Jun 10 11:44:17 localhost haproxy[2130]: xx.xx.xx.192:45948 [10/Jun/2025:11:43:56.879] FE01 msg/app02 0/0/0/6/20205 302 2460 - - SD-- 15/15/1/1/0 0/0 {xxx.fm} "HEAD /909154f60aac4440ae0692051cb39b15 HTTP/1.1"
Jun 10 11:44:40 localhost haproxy[2130]: xx.xx.xx.192:49568 [10/Jun/2025:11:44:19.826] FE01 msg/app05 0/0/0/3/20341 302 1153 - - SD-- 13/13/1/0/0 0/0 {xxx.fm} "HEAD /72300ba74bec4a8391557cec880ac77f HTTP/1.1"
Jun 10 11:44:58 localhost haproxy[2130]: xx.xx.xx.192:50247 [10/Jun/2025:11:44:36.860] FE01 msg/app07 0/0/0/4/21142 302 2230 - - SD-- 14/14/1/0/0 0/0 {xxx.fm} "HEAD /37882b3b9b9d45239b45c4fc106679af HTTP/1.1"
Jun 10 11:45:12 localhost haproxy[2130]: xx.xx.xx.192:53958 [10/Jun/2025:11:44:51.417] FE01 msg/app06 0/0/0/6/20815 302 3034 - - SD-- 18/18/0/0/0 0/0 {xxx.fm} "HEAD /375e5f75c40e482c93bdcffe9e44d526 HTTP/1.1"
Jun 10 11:45:39 localhost haproxy[2130]: xx.xx.xx.192:29418 [10/Jun/2025:11:45:18.646] FE01 msg/app03 0/0/0/2/20387 302 2340 - - SD-- 15/15/0/0/0 0/0 {xxx.fm} "HEAD /20465b8782e14791953055c4cfdc4f3b HTTP/1.1"
Jun 10 11:46:16 localhost haproxy[2130]: xx.xx.xx.192:39574 [10/Jun/2025:11:45:56.113] FE01 msg/app07 0/0/0/4/20487 302 2616 - - SD-- 14/14/0/0/0 0/0 {xxx.fm} "HEAD /ba17119d31724f04a6b911cf32232b38 HTTP/1.1"
Jun 10 11:46:56 localhost haproxy[2130]: xx.xx.xx.192:54772 [10/Jun/2025:11:46:36.376] FE01 msg/app03 0/0/0/2/20239 302 2254 - - SD-- 16/16/3/1/0 0/0 {xxx.fm} "HEAD /553b97ffcdfe45fabae20624e314995c HTTP/1.1"
Jun 10 11:47:00 localhost haproxy[2130]: xx.xx.xx.192:18868 [10/Jun/2025:11:46:39.345] FE01 msg/app03 0/0/0/2/20800 302 410 - - SD-- 15/15/2/0/0 0/0 {xxx.fm} "HEAD /ff6a08b9ddf742b689787da2f33879cc HTTP/1.1"type or paste code here

There are nearly 300 bugs in v3.0.3 that are already fixed in subsequent bugfix releases.
Please use the latest bugfix release, which is at this time 3.0.11

The error message indicates that the backend crashes / fails during a 302 redirect coming from the backend server, as documented:

 SD   The connection to the server died with an error during the data
      transfer. This usually means that HAProxy has received an RST from
      the server or an ICMP message from an intermediate equipment while
      exchanging data with the server. This can be caused by a server crash
      or by a network issue on an intermediate equipment.

Whether this is true or if some haproxy bug is causing a misinterpretation is difficult to say. I strongly suggest upgrading to the latest bugfix release.

Thanks for your reply Lukas.

I will try to get the haproxy upgraded to 3.0.11 as soon as possible.

Meanwhile, packet captures on both (app server and haproxy servers) could not help us pin point the issue.

Also, I am seeing a high count of “connection resets during transfers” by server, as per the stats page. This number more or less aligns with the number of 502 bad gateways on the frontend.
The time duration on the all the SD errors are all at exactly or very near to 20 seconds.
Could it be anything to do with the timeouts configured?

   retries                  3
    timeout http-request     10s
    timeout queue            1m
    timeout connect          10s
    timeout client           2m
    timeout server           10m
    timeout tunnel           10m
    timeout client-fin       10s
    timeout http-keep-alive  10s
    timeout check            10s
    maxconn                  20000

In your traces you should specifically look for those HEAD requests where the backend would generate a 302 response, as indicated in the logs you posted earlier.

Understanding if those specific HTTP transactions are somehow fishy or not would be a good next step.

But yeah I think you should prioritize the upgrade to 3.0.11

Thanks a bunch Lukas, will do that.

Update:
We were finally able to upgrade to v3.0.11, we are neither getting the SD errors nor the 502 bad gateway anymore.
It is safe to say the SD errors were indeed a misrepresentation due to bugs in v3.0.3.
Thank you once again for your assistance, Lukas :+1:

1 Like