Postgres failovers - reconnection delay

Hi HAProxy community,

We have a fairly widely-used architecture to support HA postgres. This consists of haproxy sitting in front of two postgres servers set up with patroni and streaming replication. haproxy detects the cluster leader via the bgw_replstatus postgres plugin which advertises via a known port. When I force a patroni failover (essentially an automated switch of leader/replica for those not using patroni) haproxy detects this with no problems and new connections are routed as required.

My question is to do with how long this takes. I have a simple test script which does a table insert once every second. This will fail during a failover as the connection drops but I wait and retry after an interval. If I wait 5-10 seconds, all is well, but anything less than that and I risk picking up a connection to the replica instance and having failed inserts.

Is it typical to have to wait so long or am I missing something in terms of configuring haproxy or my client db libraries etc?

Thanks for any insight
m

After living with this until I could upgrade, I have encountered exactly the same issue with PG17 and haproxy 2.7.2. I wonder if anyone has any insights:

  • postgres shuts down and its replica starts up (under control of patroni) - these nodes are advertising their primariness on port 8008
  • haproxy continues to direct connections to the down backend for several seconds after this, resulting in failed transactions

Here is the haproxy.cfg:

...
frontend pgmaster
        bind 0.0.0.0:5432
        default_backend pgmaster

backend pgmaster
        mode tcp
        option httpchk HEAD /primary
        http-check expect status 200
        default-server inter 1s on-marked-down shutdown-sessions
        server infodb00 192.168.0.220:5432 check port 8008 inter 1s
        server infodb01 192.168.0.221:5432 check port 8008 inter 1s

frontend pgstandby
        bind 0.0.0.0:5433
        default_backend pgstandby

backend pgstandby
        mode tcp
        option httpchk HEAD /replica
        http-check expect status 200
        default-server inter 1s on-marked-down shutdown-sessions
        server infodb00 192.168.0.220:5432 check port 8008 inter 1s
        server infodb01 192.168.0.221:5432 check port 8008 inter 1s
...

Are these application reconnections simply happening too quickly?

Are you sure that port 8008 immediately stops responding when the service on port 5432 goes away?

Or is there a delay between the service dying on port 5432 and a negative response on port 8008?

If your dying backend server does not respond on layer4 in port 5432 anymore (does not listen on those tcp ports anymore), then you can probably accelerate this will with something like:

observe layer4 error-limit 3 on-error mark-down

You can further decrease the error-limit, but you do risk that haproxy switches to a different server with just a couple of failed connections attempts.

If you want clarity, you need to enable logging and show your issues in the context of particular log lines.

Thanks for the response - I guess I cannot be sure of the timing between 5432 and 8008 - but this is standard postgres and, I assume, a very widespread solution.

In any case, after 3s has passed (inter 1s default fall 3) and haproxy decides the node is down, should it not kill all in-flight sessions and disallow any new ones until it has a known-good backend? I don’t understand how sessions could become established to a RO backend… I am seeing sessions stick around for 20 minutes on the replica (OK, a separate question as to how they are inspecting their connections etc, but still, this is a different discussion - I don’t see why these sessions have not been terminated by haproxy?)

Enable logging in haproxy and capture the 8008 traffic with tcpdump into a file.

A basic logging configuration can look like this:

global
 log <syslogserver> local7 debug

defaults
 option tcplog
 log global

Thanks again - I set up a little go client (its here, thanks Claude) to open 5 postgres connections via haproxy and run updates to a small table and then ran a patroni failover. If my delay between transactions is above 20 ms then the updates complete on the new primary node - less than that, and some sessions appear to be connected to the RO replica and produce a flood of update failed: pq: cannot execute UPDATE in a read-only transaction

I am now puzzling over what the logs can tell me - very grateful for any insights.

Here, ibg-bank-db-00 is the primary at start of test and there are 5 sessions

  • at failover time, there is a flood of messages showing changes in the active session counts
  • then 4 SD statuses (server dies in data phase? does one of the 5 sessions somehow survive?)
  • message that backend pg-master is unavailable (shouldn’t all sessions then be closed?)
  • message that backend pg-master is available on alt-bank-db-00 (was replica, now primary)
  • message that backend pg-standby is unavailable on old replica as expected
  • message that backend pg-standby is now available on ibg-bank-db-00 as expected
  • further counts of sessions still showing old master as targeting route?
  • 2 CD statuses (client quit during data phase?)

I am still no clearer why haproxy is allowing sessions to continue on the failed pg-master backend?

Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59405 [23/Nov/2025:13:30:47.589] pgmaster pgmaster/ibg-bank-db-00 1/0/118 108 -- 5/4/3/3/0 0/0
Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59405 [23/Nov/2025:13:30:47.589] pgmaster pgmaster/ibg-bank-db-00 1/0/118 108 -- 5/4/3/3/0 0/0
Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59413 [23/Nov/2025:13:30:47.712] pgmaster pgmaster/ibg-bank-db-00 1/0/6 0 SD 6/5/4/4/0 0/0
Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59413 [23/Nov/2025:13:30:47.712] pgmaster pgmaster/ibg-bank-db-00 1/0/6 0 SD 6/5/4/4/0 0/0
Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59411 [23/Nov/2025:13:30:47.712] pgmaster pgmaster/ibg-bank-db-00 1/0/6 0 SD 5/4/3/3/0 0/0
Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59411 [23/Nov/2025:13:30:47.712] pgmaster pgmaster/ibg-bank-db-00 1/0/6 0 SD 5/4/3/3/0 0/0
Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59409 [23/Nov/2025:13:30:47.589] pgmaster pgmaster/ibg-bank-db-00 1/0/134 108 -- 5/4/3/3/0 0/0
Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59409 [23/Nov/2025:13:30:47.589] pgmaster pgmaster/ibg-bank-db-00 1/0/134 108 -- 5/4/3/3/0 0/0
Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59407 [23/Nov/2025:13:30:47.589] pgmaster pgmaster/ibg-bank-db-00 1/0/134 108 -- 4/3/2/2/0 0/0
Nov 23 13:30:47 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59407 [23/Nov/2025:13:30:47.589] pgmaster pgmaster/ibg-bank-db-00 1/0/134 108 -- 4/3/2/2/0 0/0
Broadcast message from systemd-journald@ibg-bank-dblb-00 (Sun 2025-11-23 13:30:49 GMT):
haproxy[9943]: backend pgmaster has no server available!
Message from syslogd@ibg-bank-dblb-00 at Nov 23 13:30:49 ...
 haproxy[9943]:backend pgmaster has no server available!
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: [WARNING]  (9943) : Server pgmaster/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 1ms. 0 active and 0 backup servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: Server pgmaster/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 1ms. 0 active and 0 backup servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: Server pgmaster/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 1ms. 0 active and 0 backup servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: [NOTICE]   (9943) : haproxy version is 2.4.22-f8e3218
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: [NOTICE]   (9943) : path to executable is /usr/sbin/haproxy
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: [ALERT]    (9943) : backend 'pgmaster' has no server available!
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: backend pgmaster has no server available!
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: backend pgmaster has no server available!
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: [WARNING]  (9943) : Server pgmaster/alt-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 1ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: Server pgmaster/alt-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 1ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Nov 23 13:30:49 ibg-bank-dblb-00 haproxy[9943]: Server pgmaster/alt-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 1ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Broadcast message from systemd-journald@ibg-bank-dblb-00 (Sun 2025-11-23 13:30:50 GMT):
haproxy[9943]: backend pgstandby has no server available!
Message from syslogd@ibg-bank-dblb-00 at Nov 23 13:30:50 ...
haproxy[9943]:backend pgstandby has no server available!
Nov 23 13:30:50 ibg-bank-dblb-00 haproxy[9943]: [WARNING]  (9943) : Server pgstandby/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 23 13:30:50 ibg-bank-dblb-00 haproxy[9943]: Server pgstandby/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 23 13:30:50 ibg-bank-dblb-00 haproxy[9943]: Server pgstandby/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 23 13:30:50 ibg-bank-dblb-00 haproxy[9943]: [ALERT]    (9943) : backend 'pgstandby' has no server available!
Nov 23 13:30:50 ibg-bank-dblb-00 haproxy[9943]: backend pgstandby has no server available!
Nov 23 13:30:50 ibg-bank-dblb-00 haproxy[9943]: backend pgstandby has no server available!
Nov 23 13:30:52 ibg-bank-dblb-00 haproxy[9943]: [WARNING]  (9943) : Server pgstandby/ibg-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 2ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Nov 23 13:30:52 ibg-bank-dblb-00 haproxy[9943]: Server pgstandby/ibg-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 2ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Nov 23 13:30:52 ibg-bank-dblb-00 haproxy[9943]: Server pgstandby/ibg-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 2ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Nov 23 13:30:58 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.230:42956 [23/Nov/2025:13:30:43.537] stats stats/<PROMEX> 0/0/15000 66288 LR 7/1/0/0/0 0/0
Nov 23 13:30:58 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.230:42956 [23/Nov/2025:13:30:43.537] stats stats/<PROMEX> 0/0/15000 66288 LR 7/1/0/0/0 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59417 [23/Nov/2025:13:30:47.794] pgmaster pgmaster/ibg-bank-db-00 1/3007/21680 82416 -- 7/6/5/5/3 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59417 [23/Nov/2025:13:30:47.794] pgmaster pgmaster/ibg-bank-db-00 1/3007/21680 82416 -- 7/6/5/5/3 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59419 [23/Nov/2025:13:30:47.794] pgmaster pgmaster/ibg-bank-db-00 1/3007/21726 82666 -- 6/5/4/4/3 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59419 [23/Nov/2025:13:30:47.794] pgmaster pgmaster/ibg-bank-db-00 1/3007/21726 82666 -- 6/5/4/4/3 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59421 [23/Nov/2025:13:30:47.794] pgmaster pgmaster/ibg-bank-db-00 1/3007/21757 82666 -- 5/4/3/3/3 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59421 [23/Nov/2025:13:30:47.794] pgmaster pgmaster/ibg-bank-db-00 1/3007/21757 82666 -- 5/4/3/3/3 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59415 [23/Nov/2025:13:30:47.724] pgmaster pgmaster/ibg-bank-db-00 1/3006/21856 82791 -- 4/3/2/2/3 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59415 [23/Nov/2025:13:30:47.724] pgmaster pgmaster/ibg-bank-db-00 1/3006/21856 82791 -- 4/3/2/2/3 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59404 [23/Nov/2025:13:30:36.032] pgmaster pgmaster/ibg-bank-db-00 1/0/33678 754 CD 3/2/1/1/0 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59404 [23/Nov/2025:13:30:36.032] pgmaster pgmaster/ibg-bank-db-00 1/0/33678 754 CD 3/2/1/1/0 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59423 [23/Nov/2025:13:30:47.794] pgmaster pgmaster/ibg-bank-db-00 1/3007/21919 82916 -- 2/1/0/0/3 0/0
Nov 23 13:31:09 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.29:59423 [23/Nov/2025:13:30:47.794] pgmaster pgmaster/ibg-bank-db-00 1/3007/21919 82916 -- 2/1/0/0/3 0/0
Nov 23 13:31:13 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.230:42956 [23/Nov/2025:13:30:58.537] stats stats/<PROMEX> 0/0/15000 66297 LR 1/1/0/0/0 0/0
Nov 23 13:31:13 ibg-bank-dblb-00 haproxy[9943]: 192.168.0.230:42956 [23/Nov/2025:13:30:58.537] stats stats/<PROMEX> 0/0/15000 66297 LR 1/1/0/0/0 0/0

Hi

I should correct a misunderstanding I had here - I thought the 8008 availability check was provided by postgres itself but that’s not the case - it is serviced by the patroni HA package. So, is the suggestion here that postgres stops responding on 5432 as it shuts down (either planned or not) but that it takes patroni some time to notice this and reflect it in the 8008 status - and that during this window haproxy is supplying ‘incorrect’ sessions? If so, I still do not understand:

  • why haproxy would direct connections to a RO backend - I can see new sessions starting up on the RO replica backend so at this point the backend is back up, albeit in recovery/RO mode - is the suggestion that haproxy must still be getting misleading responses on 8008?
  • why - once a correct 8008 response does arrive and haproxy realises this node is down - haproxy does not close any open sessions to this backedn - is that not the idea of the “on-marked-down shutdown-sessions” directives?

Thanks!

The configuration you posted and the logs you posted do not match, so its difficult to draw conclusions from this.

I can see that during the switchover haproxy is actually restarted or reload. This is a major problem.

If haproxy is restart in the most critical phase during a database switchover, it’s impossible to achieve the expected result.

Find out why and stop it.

journalctl -u haproxy
systemctl status haproxy

If you see "haproxy version is " and “path to executable is” that means haproxy just restarted.

You are absolutely right, I changed the host names and IPs in my initial post as a lame form of anonymisation - full correct config is below.

But yes, you have raised a more serious point! I also wondered why I was seeing the haproxy version…

global
        log /dev/log    local0 debug
        log /dev/log    local1 debug
        #chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

defaults
        log     global
        mode    tcp
        option  tcplog
        option  dontlognull
        option  tcp-check
        timeout connect 5s
        timeout client  180s
        timeout server  180s

frontend pgmaster
        bind 0.0.0.0:5432
        default_backend pgmaster

backend pgmaster
        option httpchk HEAD /primary
        http-check expect status 200
        server ibg-bank-db-00 ibg-bank-db-00:5432 check port 8008 inter 1s
        server alt-bank-db-00 alt-bank-db-00:5432 check port 8008 inter 1s

frontend pgstandby
        bind 0.0.0.0:5433
        default_backend pgstandby

backend pgstandby
        option httpchk  HEAD /replica
        http-check expect status 200
        server ibg-bank-db-00 ibg-bank-db-00:5432 check port 8008 inter 1s
        server alt-bank-db-00 alt-bank-db-00:5432 check port 8008 inter 1s

frontend stats
   bind *:8404
   option http-use-htx
   http-request use-service prometheus-exporter if { path /metrics }
   mode http
   stats enable
   stats uri /stats
   stats refresh 10s

sorry, thats a rebuild without shutdown-sessions - i will repost.

OK, I am now unable to reproduce what I am seeing in production - after reintroducing shutdown-sessions my test updates are now correctly exiting with update failed: EOF - I need to understand why in production the sessions are still getting re-established to the RO backend ahd showing as failed updates in a RO session. I have a couple of questions in connection with this:

  • is there a way to get haproxy to dump its live config? - I am now wondering if k8s is correctly loading the new configmap with the shutdown-sessions directive as I have seen problems with this sort of thing before
  • even without shutdown-sessions, can I assume haproxy would not allow a new session through to a down backend - in which case I must assume there is a delay in patroni updating its healthcheck port in production? Hard to tell how I could test this definitively..
  • when does a debug line actually get written to log? When normal traffic is flowing there are no diagnostics (just stats fetches and PROMEX for prometheus) but as soon as I start the failover I get lots of session stats - is it just any error condition?
  • I still see the haproxy version NOTICE at backend failover - I have tested and this is not the same as I see if I do a systemd reload or restart - is this definitely a problem? I don’t think the process is restarting..

failover:

Nov 24 14:32:26 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59417 [24/Nov/2025:14:32:26.519] pgmaster pgmaster/alt-bank-db-00 1/0/3 0 SD 3/2/1/1/0 0/0
Nov 24 14:32:27 ibg-bank-dblb-00 haproxy[11212]: [WARNING]  (11212) : Server pgmaster/ibg-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 1ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Nov 24 14:32:27 ibg-bank-dblb-00 haproxy[11212]: Server pgmaster/ibg-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 1ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Nov 24 14:32:27 ibg-bank-dblb-00 haproxy[11212]: Server pgmaster/ibg-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 1ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.230:37868 [24/Nov/2025:14:32:13.537] stats stats/<PROMEX> 0/0/15000 66281 LR 7/1/0/0/0 0/0
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.230:37868 [24/Nov/2025:14:32:13.537] stats stats/<PROMEX> 0/0/15000 66281 LR 7/1/0/0/0 0/0
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: [WARNING]  (11212) : Server pgmaster/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 1ms. 1 active and 0 backup servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: Server pgmaster/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 1ms. 1 active and 0 backup servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: Server pgmaster/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 1ms. 1 active and 0 backup servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59374 [24/Nov/2025:14:32:16.676] pgmaster pgmaster/alt-bank-db-00 1/0/11863 754 D- 7/6/5/0/0 0/0
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59374 [24/Nov/2025:14:32:16.676] pgmaster pgmaster/alt-bank-db-00 1/0/11863 754 D- 7/6/5/0/0 0/0
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: [WARNING]  (11212) : Server pgstandby/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

Broadcast message from systemd-journald@ibg-bank-dblb-00 (Mon 2025-11-24 14:32:28 GMT):

haproxy[11212]: backend pgstandby has no server available!


Broadcast message from systemd-journald@ibg-bank-dblb-00 (Mon 2025-11-24 14:32:28 GMT):

haproxy[11212]: backend pgstandby has no server available!


Message from syslogd@ibg-bank-dblb-00 at Nov 24 14:32:28 ...
 haproxy[11212]:backend pgstandby has no server available!

Message from syslogd@ibg-bank-dblb-00 at Nov 24 14:32:28 ...
 haproxy[11212]:backend pgstandby has no server available!
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: Server pgstandby/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: Server pgstandby/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: [NOTICE]   (11212) : haproxy version is 2.4.22-f8e3218
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: [NOTICE]   (11212) : path to executable is /usr/sbin/haproxy
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: [ALERT]    (11212) : backend 'pgstandby' has no server available!
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: backend pgstandby has no server available!
Nov 24 14:32:28 ibg-bank-dblb-00 haproxy[11212]: backend pgstandby has no server available!
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59421 [24/Nov/2025:14:32:26.532] pgmaster pgmaster/alt-bank-db-00 1/3006/3006 0 D- 6/5/4/4/3 0/0
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59421 [24/Nov/2025:14:32:26.532] pgmaster pgmaster/alt-bank-db-00 1/3006/3006 0 D- 6/5/4/4/3 0/0
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59419 [24/Nov/2025:14:32:26.532] pgmaster pgmaster/alt-bank-db-00 1/3006/3006 0 D- 5/4/3/3/3 0/0
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59419 [24/Nov/2025:14:32:26.532] pgmaster pgmaster/alt-bank-db-00 1/3006/3006 0 D- 5/4/3/3/3 0/0
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59427 [24/Nov/2025:14:32:26.533] pgmaster pgmaster/alt-bank-db-00 1/3004/3004 0 D- 4/3/2/2/3 0/0
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59427 [24/Nov/2025:14:32:26.533] pgmaster pgmaster/alt-bank-db-00 1/3004/3004 0 D- 4/3/2/2/3 0/0
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59425 [24/Nov/2025:14:32:26.533] pgmaster pgmaster/alt-bank-db-00 1/3004/3004 0 D- 3/2/1/1/3 0/0
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59425 [24/Nov/2025:14:32:26.533] pgmaster pgmaster/alt-bank-db-00 1/3004/3004 0 D- 3/2/1/1/3 0/0
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59423 [24/Nov/2025:14:32:26.533] pgmaster pgmaster/alt-bank-db-00 1/3004/3004 0 D- 2/1/0/0/3 0/0
Nov 24 14:32:29 ibg-bank-dblb-00 haproxy[11212]: 192.168.0.29:59423 [24/Nov/2025:14:32:26.533] pgmaster pgmaster/alt-bank-db-00 1/3004/3004 0 D- 2/1/0/0/3 0/0
Nov 24 14:32:31 ibg-bank-dblb-00 haproxy[11212]: [WARNING]  (11212) : Server pgstandby/alt-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 2ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Nov 24 14:32:31 ibg-bank-dblb-00 haproxy[11212]: Server pgstandby/alt-bank-db-00 is UP, reason: Layer7 check passed, code: 200, check duration: 2ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
N

reload:

Nov 24 14:27:13 ibg-bank-dblb-00 haproxy[11180]: 192.168.0.230:39148 [24/Nov/2025:14:26:58.536] stats stats/<PROMEX> 0/0/15001 66333 LR 1/1/0/0/0 0/0
Nov 24 14:27:15 ibg-bank-dblb-00 systemd[1]: Reloading HAProxy Load Balancer...
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11087]: [WARNING]  (11087) : Reexecuting Master process
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11087]: [NOTICE]   (11087) : New worker #1 (11194) forked
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: [WARNING]  (11180) : Proxy pgmaster stopped (cumulated conns: FE: 200, BE: 0).
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Proxy pgmaster stopped (cumulated conns: FE: 200, BE: 0).
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Proxy pgmaster stopped (cumulated conns: FE: 200, BE: 0).
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: [WARNING]  (11180) : Proxy pgstandby stopped (cumulated conns: FE: 0, BE: 0).
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Proxy pgstandby stopped (cumulated conns: FE: 0, BE: 0).
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Proxy pgstandby stopped (cumulated conns: FE: 0, BE: 0).
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: [WARNING]  (11180) : Proxy stats stopped (cumulated conns: FE: 1, BE: 0).
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Proxy stats stopped (cumulated conns: FE: 1, BE: 0).
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Proxy stats stopped (cumulated conns: FE: 1, BE: 0).
Nov 24 14:27:15 ibg-bank-dblb-00 systemd[1]: Reloaded HAProxy Load Balancer.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: [WARNING]  (11180) : Stopping frontend GLOBAL in 0 ms.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: [WARNING]  (11180) : Stopping backend pgmaster in 0 ms.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Stopping backend pgmaster in 0 ms.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Stopping backend pgmaster in 0 ms.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: [WARNING]  (11180) : Stopping backend pgstandby in 0 ms.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Stopping backend pgstandby in 0 ms.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11180]: Stopping backend pgstandby in 0 ms.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11087]: [WARNING]  (11087) : Former worker #1 (11180) exited with code 0 (Exit)
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11194]: [WARNING]  (11194) : Server pgmaster/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 13ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11194]: Server pgmaster/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 13ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11194]: Server pgmaster/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 13ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11194]: [WARNING]  (11194) : Server pgstandby/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11194]: Server pgstandby/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:27:15 ibg-bank-dblb-00 haproxy[11194]: Server pgstandby/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:27:28 ibg-bank-dblb-00 haproxy[11194]: 192.168.0.230:40492 [24/Nov/2025:14:27:28.537] stats stats/<PROMEX> 0/0/0 66223 LR 1/1/0/0/0 0/0

restart:

Nov 24 14:28:20 ibg-bank-dblb-00 haproxy[11087]: [WARNING]  (11087) : Exiting Master process...
Nov 24 14:28:20 ibg-bank-dblb-00 systemd[1]: Stopping HAProxy Load Balancer...
Nov 24 14:28:20 ibg-bank-dblb-00 haproxy[11087]: [NOTICE]   (11087) : haproxy version is 2.4.22-f8e3218
Nov 24 14:28:20 ibg-bank-dblb-00 haproxy[11087]: [NOTICE]   (11087) : path to executable is /usr/sbin/haproxy
Nov 24 14:28:20 ibg-bank-dblb-00 haproxy[11087]: [ALERT]    (11087) : Current worker #1 (11194) exited with code 143 (Terminated)
Nov 24 14:28:20 ibg-bank-dblb-00 haproxy[11087]: [WARNING]  (11087) : All workers exited. Exiting... (0)
Nov 24 14:28:20 ibg-bank-dblb-00 systemd[1]: haproxy.service: Deactivated successfully.
Nov 24 14:28:20 ibg-bank-dblb-00 systemd[1]: Stopped HAProxy Load Balancer.
Nov 24 14:28:20 ibg-bank-dblb-00 systemd[1]: haproxy.service: Consumed 1.584s CPU time.
Nov 24 14:28:20 ibg-bank-dblb-00 systemd[1]: Starting HAProxy Load Balancer...
Nov 24 14:28:20 ibg-bank-dblb-00 haproxy[11200]: [NOTICE]   (11200) : New worker #1 (11202) forked
Nov 24 14:28:20 ibg-bank-dblb-00 systemd[1]: Started HAProxy Load Balancer.
Nov 24 14:28:20 ibg-bank-dblb-00 haproxy[11202]: [WARNING]  (11202) : Server pgmaster/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 12ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:28:20 ibg-bank-dblb-00 haproxy[11202]: Server pgmaster/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 12ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:28:20 ibg-bank-dblb-00 haproxy[11202]: Server pgmaster/ibg-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 12ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:28:21 ibg-bank-dblb-00 haproxy[11202]: [WARNING]  (11202) : Server pgstandby/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:28:21 ibg-bank-dblb-00 haproxy[11202]: Server pgstandby/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:28:21 ibg-bank-dblb-00 haproxy[11202]: Server pgstandby/alt-bank-db-00 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Nov 24 14:28:28 ibg-bank-dblb-00 haproxy[11202]: 192.168.0.230:57998 [24/Nov/2025:14:28:28.537] stats stats/<PROMEX> 0/0/0 66214 LR 1/1/0/0/0 0/0
Nov 24 14:28:28 ibg-bank-dblb-00 haproxy[11202]: 192.168.0.230:57998 [24/Nov/2025:14:28:28.537] stats stats/<PROMEX> 0/0/0 66214 LR 1/1/0/0/0 0/0


No, there is not.

No, a haproxy instance in the process of shutting down will not process any health check changes, and therefor not shutdown any sessions forcefully.

When a session is terminated. You may want to add option logasap to get a log on session establishment.

The fact that it doesn’t restart based on systemd doesn’t mean it doesn’t get restarted. It means it doesn’t get restarted through the systemd process manager.

It is a huge problem and likely the actual cause of the issues you are seeing.

First off all like I mentioned above, the process shutting down is likely not shutting down sessions because its no longer processing health checks.

Second of all the new process starting will let sessions through because it has to assume that all servers are up until actual health checks come in. In this instance with this configuration you actually get load balancing for a few seconds between one server and the other, which is the very last thing you want.

haproxy version is and path to executable is are logs that are only emitted during process start. I can see that the PID doesn’t change, but something is clearly interacting with haproxy, likely through signals directly send to haproxy.

Thanks for the responses - note that I am no longer seeing the issue of incorrect RO postgres sessions in my test setup since I reinstated shutdown-sessions..! However, I am in my production environment which has the same config. The only significant difference I can think of is that in production, the problematic clients are java applications using hikariCP whereas in test I am using golang’s lib/pq with no explicit connection pool managment.

You seem to be certain the haproxy is restarting or similar, despite the PID remaining the same, as you say - I have no idea why this would be. There is nothing exotic happening here - its an automated download of postgres, patroni and haproxy on rocky 9. This is a pretty widespread solution - albeit a lot of people use something like pgBouncer for connection pooling somewhere in the picture while we do not - and the haproxy.cfg I am using is that which is documented for patroni.

I’m not really sure where to take it from here - I will try to set up a full java/hikari client for testing to see fi I can replicate and I guess I will see if there is anything more I can do to trace what haproxy is up to at failover time…

Thanks again for your responses.

Another question or two.. it seems I can again replicate the reconnections to RO/standby backend, just by increasing the inter value - this makes sense because it takes haproxy longer to notice a backend is unavailable. During the interval between the backend primary going down and restarting in RO/recovery mode, the pg client will simply request a new session and will be given the wrong, old details (I have enabled connection logging in postgres and can see the new sessions arrive from haproxy after the replica starts up). Once haproxy has done a healthcheck it should close down any of these incorrect sessions and the client will receive a valid backend when it requests again - this is the bit I still have a problem with.

I have switched on debug (-d) to the binary but it does not seem to give me a lot more - at least I am unable to tell much. I thought it might show when it is doing session shutdowns, status of healthchecks etc.

I now need to fully test or recreate what I believe I am seeing in my non-test environment where haproxy appears to not close invalid sessions that were opened during this between-healthcheck window…

On the version and path messages being dumped to log and therefore suggesting haproxy is restarting - I checked the source code and although I am a long way from a C expert, it seems these messages are generated by ha_warning() or ha_alert() in errors.c:424 and could be called from many places - do I have this wrong?

There is likely a good argument here for using a much shorter inter - maybe 500ms? and look into fastinter? I am not sure of impact or best practice. Of course we can get client code to check and retry if it gets an RO session, but we really thought this should be the responsibility of the LB.

I have checked the code and indeed my assumption that haproxy version is and path to executable is is show during start/initialization is not only slightly incorrect but totally wrong.

Those 2 message are show when the first alert is triggered (of this haproxy instances life). Which in your case, and especially in your repro is when the backend 'pgstandby' has no server available! is triggered for the first time.

Again, I would suggest you enable option logasap and provide the logs again. This will make it more clear, because it will emit a connection log on connection establishment instead of only when a connection disconnects.

Thanks for taking the time to check this - as recommended I have been capturing logs with logasap in place and I think I have got to the bottom of it - I managed to replicate my issue but my suspicion is now that some of the haproxies in my k8s cluster were not restarting as expected and picking up new config - they effectively did not have shutdown-sessions in place.

After making sure these were restarted properly and making the healthchecks a little more aggressive to reduce the chance of an invalid RO session being returned by haproxy in the window between primary failure and healthcheck failure triggering a down state, I am now unable to recreate these persistent RO sessions.

For anyone else hitting this, enabling stdout logging e.g.:

    global
            log stdout  format raw  local0  debug
            stats timeout 30
            daemon

    defaults
            log     global
            mode    tcp
            option  tcplog
            option  logasap
...
    default-server inter 500ms fall 2 on-marked-down shutdown-sessions

allows us to use kubectl logs in k8s..

Post-failover logs showed messages like the following… Here we have around 100 sessions connected to db-west - this instance is shutdown and its replica db-east starts up. Until haproxy notices sufficient healthcheck failures from west, it presumably continues to supply sessions to it (assuming it is even available yet - the database has to shutdown and restart in replica mode and do any replication replay to become ready to accept connections, even if only RO)

10.233.74.123:33084 [30/Nov/2025:12:17:00.142] pgmaster pgmaster/db-west 1/5/+5 +0 -- 115/114/114/114/0 0/0
10.233.74.123:33094 [30/Nov/2025:12:17:00.209] pgmaster pgmaster/db-west 1/5/+5 +0 -- 116/115/115/115/0 0/0
10.233.74.123:33102 [30/Nov/2025:12:17:00.281] pgmaster pgmaster/db-west 1/6/+6 +0 -- 115/114/114/114/0 0/0
10.233.74.123:33110 [30/Nov/2025:12:17:00.362] pgmaster pgmaster/db-west 1/5/+5 +0 -- 116/115/115/115/0 0/0
10.233.74.123:33114 [30/Nov/2025:12:17:00.439] pgmaster pgmaster/db-west 1/6/+6 +0 -- 116/115/115/115/0 0/0
10.233.74.123:33116 [30/Nov/2025:12:17:00.520] pgmaster pgmaster/db-west 1/5/+5 +0 -- 116/115/115/115/0 0/0
10.233.95.238:32978 [30/Nov/2025:12:17:10.567] pgmaster pgmaster/db-west 1/6/+6 +0 -- 114/113/113/113/0 0/0
10.233.95.238:32982 [30/Nov/2025:12:17:10.610] pgmaster pgmaster/db-west 1/5/+5 +0 -- 114/113/113/113/0 0/0
[WARNING]  (8) : Server pgmaster/db-east is UP, reason: Layer7 check passed, code: 200, check duration: 4ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Server pgmaster/db-east is UP, reason: Layer7 check passed, code: 200, check duration: 4ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
10.233.101.117:49396 [30/Nov/2025:12:17:14.331] pgmaster pgmaster/db-east 1/1/+1 +0 -- 61/60/60/1/0 0/0
[WARNING]  (8) : Server pgmaster/db-west is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 14ms. 1 active and 0 backup servers left. 46 sessions active, 0 requeued, 0 remaining in queue.
Server pgmaster/db-west is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 14ms. 1 active and 0 backup servers left. 46 sessions active, 0 requeued, 0 remaining in queue.
10.233.74.161:47196 [30/Nov/2025:12:17:14.480] pgmaster pgmaster/db-east 1/1/+1 +0 -- 11/10/10/2/0 0/0
10.233.74.161:47210 [30/Nov/2025:12:17:14.541] pgmaster pgmaster/db-east 1/1/+1 +0 -- 12/11/11/3/0 0/0
10.233.101.234:56774 [30/Nov/2025:12:17:11.561] pgmaster pgmaster/db-west 1/-1/+3025 +0 DC 12/11/10/0/3 0/0
10.233.101.234:42056 [30/Nov/2025:12:17:14.600] pgmaster pgmaster/db-east 1/1/+1 +0 -- 13/12/12/5/0 0/0
10.233.74.161:47212 [30/Nov/2025:12:17:14.600] pgmaster pgmaster/db-east 1/1/+1 +0 -- 13/12/12/5/0 0/0
10.233.95.179:38734 [30/Nov/2025:12:17:11.580] pgmaster pgmaster/db-west 1/-1/+3029 +0 DC 13/12/11/0/3 0/0
10.233.95.179:38750 [30/Nov/2025:12:17:14.621] pgmaster pgmaster/db-east 1/1/+1 +0 -- 13/12/12/6/0 0/0
10.233.101.214:42878 [30/Nov/2025:12:17:14.660] pgmaster pgmaster/db-east 1/1/+1 +0 -- 15/14/14/8/0 0/0
10.233.101.234:42064 [30/Nov/2025:12:17:14.662] pgmaster pgmaster/db-east 1/1/+1 +0 -- 16/15/15/9/0 0/0
10.233.74.161:47228 [30/Nov/2025:12:17:14.663] pgmaster pgmaster/db-east 1/1/+1 +0 -- 16/15/15/9/0 0/0
10.233.95.179:38760 [30/Nov/2025:12:17:14.672] pgmaster pgmaster/db-east 1/1/+1 +0 -- 17/16/16/10/0 0/0
10.233.74.161:47234 [30/Nov/2025:12:17:14.722] pgmaster pgmaster/db-east 1/1/+1 +0 -- 20/19/19/13/0 0/0
10.233.101.234:42074 [30/Nov/2025:12:17:14.723] pgmaster pgmaster/db-east 1/1/+1 +0 -- 20/19/19/13/0 0/0
10.233.95.179:38774 [30/Nov/2025:12:17:14.723] pgmaster pgmaster/db-east 1/1/+1 +0 -- 20/19/19/13/0 0/0
10.233.101.214:42882 [30/Nov/2025:12:17:14.729] pgmaster pgmaster/db-east 1/1/+1 +0 -- 21/20/20/14/0 0/0
10.233.95.179:38778 [30/Nov/2025:12:17:14.779] pgmaster pgmaster/db-east 1/1/+1 +0 -- 22/21/21/15/0 0/0
10.233.101.234:42082 [30/Nov/2025:12:17:14.784] pgmaster pgmaster/db-east 1/1/+1 +0 -- 24/23/23/17/0 0/0
10.233.74.161:47242 [30/Nov/2025:12:17:14.785] pgmaster pgmaster/db-east 1/0/+0 +0 -- 24/23/23/17/0 0/0
10.233.101.214:42896 [30/Nov/2025:12:17:14.796] pgmaster pgmaster/db-east 1/1/+1 +0 -- 25/24/24/18/0 0/0
10.233.95.179:38794 [30/Nov/2025:12:17:14.831] pgmaster pgmaster/db-east 1/1/+1 +0 -- 26/25/25/19/0 0/0
10.233.101.234:42084 [30/Nov/2025:12:17:14.853] pgmaster pgmaster/db-east 1/1/+1 +0 -- 27/26/26/20/0 0/0
10.233.101.214:42898 [30/Nov/2025:12:17:14.860] pgmaster pgmaster/db-east 1/1/+1 +0 -- 29/28/28/22/0 0/0
10.233.74.161:47246 [30/Nov/2025:12:17:14.860] pgmaster pgmaster/db-east 1/1/+1 +0 -- 29/28/28/22/0 0/0
10.233.95.179:38796 [30/Nov/2025:12:17:14.883] pgmaster pgmaster/db-east 1/1/+1 +0 -- 30/29/29/23/0 0/0
10.233.101.214:42908 [30/Nov/2025:12:17:14.920] pgmaster pgmaster/db-east 1/0/+0 +0 -- 32/31/31/25/0 0/0
10.233.74.161:47254 [30/Nov/2025:12:17:14.916] pgmaster pgmaster/db-east 1/11/+11 +0 -- 32/31/31/25/0 0/0
10.233.101.234:42090 [30/Nov/2025:12:17:14.928] pgmaster pgmaster/db-east 1/1/+1 +0 -- 33/32/32/26/0 0/0
10.233.95.179:38808 [30/Nov/2025:12:17:14.934] pgmaster pgmaster/db-east 1/1/+1 +0 -- 34/33/33/27/0 0/0
[WARNING]  (8) : Server pgstandby/db-east is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 5ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT]    (8) : backend 'pgstandby' has no server available!
Server pgstandby/db-east is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 5ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
backend pgstandby has no server available!
10.233.95.183:60350 [30/Nov/2025:12:17:12.934] pgmaster pgmaster/db-west 1/2019/+2019 +0 D- 34/33/33/1/2 0/0
10.233.101.39:38850 [30/Nov/2025:12:17:11.932] pgmaster pgmaster/db-west 1/3031/+3031 +0 D- 33/32/32/1/3 0/0
10.233.95.183:60356 [30/Nov/2025:12:17:14.964] pgmaster pgmaster/db-east 1/1/+1 +0 -- 33/32/32/28/0 0/0
10.233.101.39:47870 [30/Nov/2025:12:17:14.977] pgmaster pgmaster/db-east 1/1/+1 +0 -- 35/34/34/30/0 0/0
10.233.101.214:42924 [30/Nov/2025:12:17:14.978] pgmaster pgmaster/db-east 1/1/+1 +0 -- 35/34/34/30/0 0/0
10.233.74.161:47266 [30/Nov/2025:12:17:14.985] pgmaster pgmaster/db-east 1/1/+1 +0 -- 36/35/35/31/0 0/0
...
10.233.101.117:49500 [30/Nov/2025:12:17:16.242] pgmaster pgmaster/db-east 1/1/+1 +0 -- 92/91/91/91/0 0/0
10.233.101.117:49516 [30/Nov/2025:12:17:16.309] pgmaster pgmaster/db-east 1/1/+1 +0 -- 93/92/92/92/0 0/0
[WARNING]  (8) : Server pgstandby/db-west is UP, reason: Layer7 check passed, code: 200, check duration: 27ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Server pgstandby/db-west is UP, reason: Layer7 check passed, code: 200, check duration: 27ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
10.233.95.238:55000 [30/Nov/2025:12:17:18.550] pgmaster pgmaster/db-east 1/1/+1 +0 -- 94/93/93/93/0 0/0
10.233.101.118:46914 [30/Nov/2025:12:17:22.038] pgmaster pgmaster/db-east 1/2/+2 +0 -- 95/94/94/94/0 0/0

A couple of observations/questions:

  • in our scenario here, the new db-east primary appears to become available before the old is marked down - I’m not sure I fully understand how haproxy will decide how to satisfy new connection requests at this point?
  • this does not last long - we then very quickly have db-west marked down - after this we can see connections with status DC which I understand to mean have been closed by the shutdown-sessions directive
  • I thought that log entries with a + sign for the last timer and data transfer fields were connection events but that doesn’t seem to hold for these - I also was expecting to see lots of disconnects from the primary db shutting down and closing connections - but perhaps haproxy does not track or log this?
  • for a few seconds there is no standby backend, presumably until the old primary has had time to do log catchup and pass enough haproxy healthchecks..

Anyway, I suspect the root cause here was user error (level 8!) and not making sure the config was in place as expected. Thanks very much to @lukastribus for the patient comments.

There is an excellent introduction to haproxy logging here for those with my level of ignorance: Introduction to HAProxy Logging: A Practical Guide

1 Like