For simplicity of management, HAProxy_1 is working only in TCP mode.
Unfortunately with the standard L4 health check, HAProxy_1 is unable to detect if HAProxy_2’s backend is down as HAProxy_2 still accepts the TCP Connection.
I tried having HAProxy_2 reject the connection using
tcp-request connection reject if { nbsrv() lt 1 }
but this does not help.
I implemented an external health check in HAProxy_1 using a small Rust program that just opens a TCP connection and sees if it is closed by the remote in less than 100ms. This works perfectly.
Is there a better (without external health check) way ?
If not, wouldn’t it be a nice feature to add in HAProxy (between the L4 basic check and the protocol specific checks) ?
For sure this would be cleaner when the backend protocol is known by HAProxy (like HTTP).
Today we put everything in TCP on HAProxy_1 to keep its configuration simple and independant from the backend type (we have many backends and it is easier for us to manage the backend specific stuff only in HAProxy_2).
However even if we did that, in the case the backend is itself pure TCP (a protocol not known to HAProxy) then we would still have the issue… HAProxy_2 correctly detects that the backend is down but HAProxy_1 does not notice…
One great approach would be to support a tcp-check wait <timeout>. If the connection is closed before the end of the timeout then the check would be considered failing.
If the protocol is not known / always the same, I’d suggest to create on haproxy_2 a new HTTP frontend, that is returning 200 vs 503 errors based on it’s own backend status:
Then, on haproxy_1 you HTTP health check this port on haproxy_2. This way you are not actually checking the application, but a HTTP endpoint that haproxy_2 publishes, based on it’s backend status.
Thanks. I actually thought of that approach. It is not very convenient for us as we have a lot of firewalls/other security devices between haproxy_1 and haproxy_2 so opening an additional port adds:
more administrative work to configure all of these
may introduce complex failure scenarios of one of the port failing but the other still working (due to intermediate devices)
Would the community be interested in the publishing of the external check code ?
Would HAProxy be open to thinking about tcp-check wait directive ?
I can’t answer that, I suggest you file a feature request on github for this. However I’m not sure we want to bring in so much feature creep in the already complicated health check code, for a domain specific workaround.