backend bk_foo
mode tcp
no option http-server-close
log global
option tcplog
timeout server 1m
timeout connect 5s
server foo smtp.example.com:587 check
The problem is that if smtp.example.com becomes unreachable due to network problems, it is marked as down/failed by haproxy and never comes back up even when the server does, until haproxy is reloaded manually. Is there any way to never mark it as down? Thanks.
Share the output of haproxy -vv and enable logging. Then share the logs (you should see backend server down event and usually you would see the server up event as well).
No idea why this happens though, never heard of such an issue.
It looks like the health check does not recover in those cases.
Capturing the health check traffic, capturing the syscalls (via strace -tt) and, very importantly, providing the output of haproxy -vv is required to troubleshoot further.
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with network namespace support.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace
Try upgrading to a recent stable version of haproxy. If that doesn’t help, you will have to provide the additional informations requested earlier (capturing the health check traffic, capturing the syscalls: via strace -tt ).
Thank you. The challenge is that we don’t manage to reproduce the issue. It happens for an unknown reason after a few days of operation. When I have more details, I will provide.