Error 503 showing up even with backend servers up

Hello everyone, im having a hard time trying to figure out what is causing this behavior, so far i got nothing, i’ll describe the scenario below:

All services we control have haproxy acting as a reverse proxy, the problem is : i have one service trying to communicate with another and getting 503 as a response from haproxy, they can ping, traceroute etc. The error is when my script lua sends a request to the other service and gets 503 even with servers up.

Server1 → server with script lua, server sending the request, origin

Server2 → server receiving request, destination

These servers are ec2, we control the ACL rules, both can be reached by ICMP or HTTP, they can communicate. As I explained, they also have haproxy in front of containers. In Server1 i have one script lua verifying backends health, if one backend is going down the script lua sends a GET request to an endpoint in Server2, the problem is happening in this request between Server1 and Server2, when the script lua sends the request i get one 503, that is the part i cant figure out why.

Here is the error i get in Server1 :
-:- [07/Oct/2025:18:17:17.487] <HTTPCLIENT> -/- 3/0/-1/-1/1 503 217 - - SC-- 1/0/0/0/3 0/0 {64:ff9b::36cf:a2c7} "GET ``https://url/api/v3/haproxy/``HTTP/1.1"

Looking at the haproxy logs in Server2 the request never arrives, so i dont have any log in the Server2, the backends of server2 are all up, i was able to make curl request from Server1 to Server2 outside and inside the container :

Server1 outside of container/inside of container:

curl -I --location -H "Content-Type: text/html" '``https://url/api/v3/haproxy/

Server2 logs:
ip:5950 [04/Dec/2025:13:09:36.286] http-https-in~ devops/srv-devops 0/0/1/4/5 401 269 - - ---- 14/14/1/1/0 0/0 “HEAD /api/v3/haproxy/ HTTP/1.1”

And this error only occurs sometimes, in general works fine except in these cases. In server1 im using haproxy 2.8.15, Server2 haproxy 2.6

Im aware what the documentation says about this kind of error

SC The server or an equipment between it and haproxy explicitly refused
the TCP connection (the proxy received a TCP RST or an ICMP message
in return). Under some circumstances, it can also be the network
stack telling the proxy that the server is unreachable (e.g. no route,
or no ARP response on local network). When this happens in HTTP mode,
the status code is likely a 502 or 503 here.

But the problem is our EC2 doesn’t have any kind of firewall between or inside the machines even ufw is disable, we only use ACLs, here is the config file of both servers:

server1 :

global
    log stdout format raw local0 info
    maxconn 200000
    user root
    group root
    lua-load /usr/local/etc/haproxy/script-lua.lua

http-errors custom_errors
    errorfile 503 /usr/local/etc/haproxy/errors/503.http

defaults
    mode http
    log global
    log-format '{"host":"%H","time":"%Tl","totalTime":"%Tt","serverTime":"%Tr","client_ip":"%ci","backend":"%b","frontend":"%ft","server":"%s","upload":"%U","download":"%B","statusCode":"%ST","method":"%HM","uri":"%[capture.req.uri,json(utf8s)]","body":"%[capture.req.hdr(0),json(utf8s)]"}'

    timeout tunnel 12h
    option  dontlognull
    retries 999
    option redispatch
    timeout connect  100000
    timeout client  200000
    timeout server  200000


resolvers docker_resolver
    nameserver dns 127.0.0.11:53
    parse-resolv-conf
    hold valid    15s
    hold other    30s
    hold refused  30s
    hold nx       30s
    hold timeout  30s
    hold obsolete 30s
    resolve_retries 3
    timeout retry 1s
    timeout resolve 1s


listen stats
    bind *:1900
    http-request use-service prometheus-exporter if { path /metrics }
    stats enable
    stats refresh 10s
    stats show-node
    stats uri  /stats

frontend main_ssl
    bind *:443   ssl crt /ssl/certkey.pem alpn h2,http/1.1  #https portal
    bind *:9994  ssl crt /ssl/certkey.pem alpn h2,http/1.1  #https main
    errorfiles custom_errors
    http-response return status 503 default-errorfiles if { status 503 }
    option http-buffer-request
    declare capture request len 10000
    http-request capture req.body id 0
    acl isPortal dst_port 443
    acl isMain dst_port 9994
    use_backend main if isMain
    use_backend portal if isPortal
    default_backend portal

frontend main-front
    bind *:80  #http portal
    bind *:9090 #http main
    errorfiles custom_errors
    http-response return status 503 default-errorfiles if { status 503 }
    option http-buffer-request
    declare capture request len 10000
    http-request capture req.body id 0
    acl isPortal dst_port 80
    acl isMain dst_port 9090
    use_backend main if isMain
    use_backend portal if isPortal
    default_backend portal


backend portal
    mode http
    compression algo gzip
    compression type text/html text/plain text/css application/json
    server portal portal:80 check resolvers docker_resolver init-addr last,127.0.0.1

backend main
    mode http
    compression algo gzip
    compression type text/html text/plain text/css application/json
    server main-service main-service:80 check resolvers docker_resolver init-addr last,127.0.0.1

Server2:

global
    log 127.0.0.1 local0 debug
    user root
    group root

# Default settings
defaults
    log stdout format raw local0 debug
    mode    http 
    option  httplog 
    timeout tunnel 12h
    option  dontlognull
    retries 3
    option forwardfor
    option redispatch
    timeout connect  100000
    timeout client  300000
    timeout server  300000
    maxconn 15000
    option forwardfor

resolvers docker_resolver
    nameserver dns 127.0.0.11:53
    parse-resolv-conf
    hold valid    15s
    hold other    30s
    hold refused  30s
    hold nx       30s
    hold timeout  30s
    hold obsolete 30s
    resolve_retries 3
    timeout retry 1s
    timeout resolve 1s

# Front-end setup
frontend http-https-in
    bind *:443 ssl crt /certs/cert.pem
    bind *:80

    use_backend devops if { hdr_dom(host) -i url }
    use_backend nodered if { hdr_dom(host) -i url }
    default_backend nodered

backend devops
    server srv-devops nr-devops:1880 check resolvers docker_resolver init-addr last,127.0.0.1

backend nodered
    server srv-nodered node-red:1880 check resolvers docker_resolver init-addr last,127.0.0.1

Obs: if you find any error in the config file is because i had to adapt some configs to not expose the real one, but essencially they are the same

Can someone have any clue why this happen?

If you don’t believe that the SC error haproxy is reporting is accurate, then you will have to capture the traffic between haproxy and the backend application and verify the specific request.

Most likely you will find something that you did not think of yet. For example, perhaps this backend application is updated by somebody else, and during the restart of the application it cannot answer requests?

Hello, I captured the traffic with tcpdump and the result was nothing. On server1, the requests made by the Lua script didn’t appear at all, only appearing when I tested with curl. And the backend wasn’t updating because my team is the only allowed to update; the second server updates rarely, and they need to be running 24/7.

You need to capture on the host in the container where the haproxy process lives that generates the SC log message.

I know that, i tried but had no luck. I tried both inside and outside of the container not a single request went to server2. I even took down one backend to make the Lua script send the alert, but even with this test, there were no requests. The only clue is this, no request is going out of lua script :confused: but i dont know why yet

obs: in another server where the lua script is sending the request normally i was able to capture the packages, everything looks normal

Ok, follow up. I’m going to investigate a hypothesis, maybe it’s connection exhaustion, capturing the request from a server that was working i was able to watch the connections, flags indicating a new connection like sending SYN or ending connections with FIN i didnt find anywhere, thats may indicate the script is keeping the connection alive maybe? i dont know yet

Most likely its ur backend . Investigate ur backend against the time out .

So, i recreated the container of server 2, here is the stats page of server2, after the recreation the problem still persists, server1 still cant send request to server2, i think i can exclude connection exhaustion hypothesis

As you can see in the image, there arent many connections alive and is far from the limit, this could be version related???

obs: the server1 was making request to backend “nodered” thats why this print