Haproxy returns 403 NOSRV error intermittently

Hi good morning everyone, I hope everyone can help with this.

So apparently I have a WAF (cloudflare) and load balancer using haproxy stacks and my backends with request flow like client -> cloudflare -> haproxy -> backend server.

Currently I’m facing a condition where I experience intermittent error 403 NOSRV.

The configuration of Haproxy configuration is as follow:

global
    log /dev/log	local0
    log /dev/log	local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    tune.bufsize    262144
    tune.maxrewrite 32768
    tune.maxaccept  -1
    user haproxy
    group haproxy
    daemon

defaults
        log global
        option httplog
        no option checkcache
        mode http
        retries 2
        option redispatch
        option forwardfor except 127.0.0.1
        maxconn 40000
        timeout client 620s
        timeout server 5m
        timeout queue 60s
        timeout connect 30s
        timeout check 60s
        timeout tunnel 1h
        timeout tarpit 60s
        timeout http-request 5m
        option abortonclose
        default-server inter 30s
        default-server fastinter 5s
        default-server downinter 5s
        default-server maxconn 1500
        default-server maxqueue 1500
        default-server on-error fail-check
        default-server slowstart 120s
        default-server weight 1
        fullconn 200000


frontend ssl
        bind 0.0.0.0:443 ssl crt /etc/haproxy/ssl/nyem_digicert.pem ciphers ECDHE-RSA-AES256-GCM-SHA384:XXXXXXXXX
        bind 0.0.0.0:80
        mode http
        tcp-request inspect-delay 5s
        option httplog
        option dontlognull
        option accept-invalid-http-request
        option forwardfor except 127.0.0.1
        monitor-uri /healthcheck-uri
        http-request add-header X-Forwarded-Proto https if { ssl_fc }

        acl host_dev hdr_beg(host) -i dev.nyem.com

        acl network_private src 10.104.0.0/24

        acl path_order_trx path -i -m beg /api-nyem/order/transactions
        acl path_order_nyem_trx path -i -m beg /api-nyem/order/nyem/transactions
        acl path_order_callback path -i -m beg /api-nyem/order/callback/nyem
        acl path_order path -i -m beg /api-ms/order/

        http-request allow if path_order_trx
        http-request allow if path_order_digiflazz_trx
        http-request allow if path_order_callback
        http-request deny if path_order !network_private

        http-response set-header Cache-control no-cache,\ no-store,\ must-revalidate if { capture.req.uri -m beg /sw.js /sw.js.map }

        # backends
        use_backend order if path_order

        use_backend dev_ws if host_dev
        default_backend dev_ws

        acl is_http hdr(X-Forwarded-Proto) eq http
        redirect scheme https code 301 if is_http

backend dev_ws
        mode http
        server dev_ws 10.104.0.6:443 ssl verify none maxconn 100

backend order
        option httpchk
        http-check send meth GET uri /application/health
        http-check expect status 200
        http-request replace-path /api-nyem(.*) \1
        mode http
        server dev_order 10.104.0.24:8300 maxconn 100

So let’s say I have 2 API in the backend server:

So I tried to simulate this CURL:

curl --location --request POST 'https://dev.nyem.com/api-nyem/order/nyem/transactions' \
--data '{
    "nyem": "123131"
}'

And it always return perfectly fine with 200. However when I tried to add new endpoint which is POST 'https://dev.nyem.com/api-nyem/order/callback/nyem' and try to hit it with:

curl --location --request POST 'https://dev.nyem.com/api-nyem/order/callback/nyem' \
--data '{
    "id": "12563858",
}'

It intermittently returning error like this:

200:

{"success":false,"code":"FAILED","message":"error","data":"error","serverTime":1694831245}

403:

<html><body><h1>403 Forbidden</h1>
Request forbidden by administrative rules.
</body></html>

I suspect this request comes from the haproxy because the request was logged in haproxy log with the following details:

200

Sep 16 02:00:56 dev-haproxy haproxy[1388152]: 162.158.162.93:52930 [16/Sep/2023:02:00:56.437] ssl~ order/dev_order 0/0/0/2/2 200 155 - - ---- 62/62/0/0/0 0/0 "POST /api-nyem/order/callback/nyem HTTP/1.1"

403:

Sep 16 02:00:57 dev-haproxy haproxy[1366652]: 162.158.162.53:60206 [16/Sep/2023:02:00:57.252] ssl~ ssl/<NOSRV> 0/-1/-1/-1/0 403 192 - - PR-- 115/115/0/0/0 0/0 {dev.nyem.com|PostmanRuntime/7.32.3|SG|178.128.58.144|80731d91cb4f4912-SIN|178.128.58.144} "POST /api-ms/order/callback/nyem HTTP/1.1"

Do you guys what is the root cause of the intermittent error while the first API works just fine? I’m already on my dead end on this one. Thanks

Oh yah this is my haproxy version:

HAProxy version 2.5.5-1ppa1~focal 2022/03/14 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2023.
Known bugs: http://www.haproxy.org/bugs/bugs-2.5.5.html
Running on: Linux 5.4.0-113-generic #127-Ubuntu SMP Wed May 18 14:30:56 UTC 2022 x86_64

You are making a request against:

/api-nyem/order/callback/nyem

In your haproxy configuration you indeed allowing requests against:

/api-nyem/order/callback/nyem

However the request Cloudflare actually sends is different (see your log):

/api-ms/order/callback/nyem

Notice the beginning of the path, it’s api-ms instead of api-nyem.

So perhaps you have an unexpected rewrite rule in your cloudflare configuration?

What matches is this configuration:

        acl path_order path -i -m beg /api-ms/order/
        http-request deny if path_order !network_private

The path begins with /api-ms/order/ but doesn’t come from your private network, so as configured, the request is denied.

Hi Lukas sorry for make it unclear, I tried to obfuscate the API link it should have been api-nyem/order instead of api-ms, but yes the real api began with api-ms

So the log in haproxy should be:

Sep 16 02:00:57 dev-haproxy haproxy[1366652]: 162.158.162.53:60206 [16/Sep/2023:02:00:57.252] ssl~ ssl/<NOSRV> 0/-1/-1/-1/0 403 192 - - PR-- 115/115/0/0/0 0/0 {dev.nyem.com|PostmanRuntime/7.32.3|SG|178.128.58.144|80731d91cb4f4912-SIN|178.128.58.144} "POST /api-nyem/order/callback/nyem HTTP/1.1"

A PR log code with a 403 response means that an haproxy is denying the request due to an ACL.

Triple check everything.

Triple check that only one haproxy instance is running, with the configuration you intend it to run. Sometimes old process lurk in the background running with an older configuration. Stopping haproxy and killing all the remaining haproxy processes is needed to clear them out.

Triple check that the ACL are configured as expected.

Triple check that the URI is really what you expect it to be.

Triple check that the source IP matches your (ACL) expectations (does it work when come from 10.104.0.0/24 as opposed to Cloudflare?)

Hi @lukastribus , I’m working the same app with @nabillarahmani . We finally got the root cause, but not sure why it can be like this.

So, with commnad ps aux | grep haproxy I got this on the server:

After doing research, we found this:

  • pid 1366652 is the parent/primary process, command haproxy -f /etc/haproxy/haproxy.cfg,
  • pid 1407222 is the master process, command usr/sbin/haproxy -sf 1406712 -x /run/haproxy/admin.sock -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -S /run/haproxy-master.sock

As you can see, both processes have different start times. Then I’m trying to kill the parent/primary process (pid 1366652). And the problem resolved, I don’t see any intermittent 403 anymore.

So it’s like the primary process is using the outdated config file, how can this happen (to ensure next time we are aware of this)?
So far we don’t see any problem after killing the primary process, but is that okay to kill the parent/primary process? Should we kill all processes and re-run the primary process?

For a clean situation, stop all haproxy processes and kill all the rest, and then start it. Check the process table to see a “normal” situation. Reload and restart haproxy a few teams then check the process table again to see a “normal” situation after reload/restarts.

systemctl stop haproxy
killall haproxy
ps auxwf

The reason multiple processes binding to the same port are possible is due to the use of SO_REUSEPORT on the socket. You can disable it with the noreuseport directive, however it’s possible reload/restart performance is impaired.

You can test reproduce this by trying to start haprocesses manually (/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg a few times). It should not be possible when noreuseport is configured.

You may also want to consider hard-stop-after to set a timeout for an old process that still serves connections after a reload situation (although this process would not accept() new connections, so it’s not really related to this case).

TIL… Didn’t know of that one before. This SO_REUSEPORT is so interesting. Thank you lukas for the explanation and the mitigation! Definitely will try that! Thanks!

1 Like