Hello everyone,
Hopefully you guys can provide some feedback on this behaviour.
Setup:
Haproxy running version 2.3 (I know a bit old-ish).
200+ Frontends (its a chonky boy)
500+ Backends
Regular frontends & backends load balancing on HTTP
ALL servers run a Layer7 healthchecks. option httpchk GET /health HTTP/1.1\r\n
max-spread-checks 1
spread-checks 50
default-server inter 5s slowstart 60s rise 1 fall 3 on-marked-down shutdown-sessions
The reload settings:
hard-stop-after 180s
Behavior (what we are seeing)
During Reload operations, the new process is initiated successfully and sends the termination signal to the old one as expected.
However, since we are doing healthchecks on ALL servers, there are a few seconds in which the new process is already accepting connections, but the healthchecks have not yet run/succeeded so that the NEW haproxy replies with NOSRV/503.
Is this expected behavior? Are we missing some setting? Can we somehow consider servers UP (by default)?
We even tweak the rise
and max-spread-checks
to try and get healthchecks to be run as quick as possible.
Waiting some feedback.
P.S: We are only noticing this behaviour recently. Can it be related with the growing number of servers to check?
Thanks