Never mark backend as failed?

rihad · April 20, 2018, 12:18pm

Hi. We have a backend with a single server:

defaults
retries 3
timeout connect 5000
timeout client 100000
timeout server 100000

backend bk_foo
mode tcp
no option http-server-close
log global
option tcplog
timeout server 1m
timeout connect 5s
server foo smtp.example.com:587 check

The problem is that if smtp.example.com becomes unreachable due to network problems, it is marked as down/failed by haproxy and never comes back up even when the server does, until haproxy is reloaded manually. Is there any way to never mark it as down? Thanks.

lukastribus · April 20, 2018, 2:50pm

Share the output of haproxy -vv and enable logging. Then share the logs (you should see backend server down event and usually you would see the server up event as well).

No idea why this happens though, never heard of such an issue.

rihad · April 20, 2018, 3:34pm

Sorry I can’t run them at the moment, but it’s haproxy 1.7.5 on FreeBSD 10.3. Please see this post for why this might be an issue: https://serverfault.com/questions/666600/haproxy-does-not-recover-after-failed-check?answertab=votes#tab-top

“And once a backend is marked as down it doesn’t go back up (this is not documented, I came to this conclusion based on my experience).”

rihad · April 21, 2018, 7:57am

Somehow no corresponding UP event was logged until haproxy was manually restarted a few hours later.

Apr 19 17:41:16 foo haproxy[76287]: Server bk_foo/foo is DOWN, reason: Layer4 connection problem, info: “Connection refused”, check duration: 14ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Apr 19 17:41:16 foo haproxy[76287]: backend bk_foo has no server available!

Apr 20 02:44:40 foo haproxy[21656]: Proxy foo started.
Apr 20 02:44:40 foo haproxy[21656]: Proxy bk_foo started.
Apr 20 02:44:41 foo haproxy[76287]: Stopping proxy foo in 0 ms.
Apr 20 02:44:41 foo haproxy[76287]: Stopping backend bk_foo in 0 ms.
Apr 20 02:44:41 foo haproxy[76287]: Proxy foo stopped (FE: 90305 conns, BE: 0 conns).
Apr 20 02:44:41 foo haproxy[76287]: Proxy bk_foo stopped (FE: 0 conns, BE: 90305 conns).

This was haproxy 1.7.5. I’ve just upgraded to 1.7.10 in case this was a bug or something.

ggerard · November 30, 2018, 2:47am

Hi Rihad, I have the exact same issue. Did you manage to find a solution?

Thanks for your help!

lukastribus · November 30, 2018, 8:06am

It looks like the health check does not recover in those cases.

Capturing the health check traffic, capturing the syscalls (via strace -tt) and, very importantly, providing the output of haproxy -vv is required to troubleshoot further.

ggerard · December 3, 2018, 12:49am

Thanks for the answer. See below the outcome of haproxy -vv

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label
OPTIONS = USE_LIBCRYPT=1 USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with network namespace support.

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

lukastribus · December 3, 2018, 4:08pm

1.8.4 is pretty old at this point, and has a a large number of subsequently fixed bugs. But I don’t know if that is the issue.

Try upgrading to a recent stable version of haproxy. If that doesn’t help, you will have to provide the additional informations requested earlier (capturing the health check traffic, capturing the syscalls: via strace -tt ).

ggerard · December 18, 2018, 10:39am

Thank you. The challenge is that we don’t manage to reproduce the issue. It happens for an unknown reason after a few days of operation. When I have more details, I will provide.

Topic		Replies	Views
Strange timeout behavior during load testing Help!	7	1777	October 5, 2017
HAProxy marks server as down while not being down - inexplicable healthcheck timeouts Help!	3	3701	March 20, 2020
HAProxy servers staying down after failed health checks Help!	0	1847	April 11, 2017
Haproxy backend server error handling Help!	1	1828	July 18, 2017
A server went down, but HAProxy kept it in UP state Help!	3	603	November 3, 2017

Never mark backend as failed?

Related topics