Never mark backend as failed?


#1

Hi. We have a backend with a single server:

defaults
retries 3
timeout connect 5000
timeout client 100000
timeout server 100000

backend bk_foo
mode tcp
no option http-server-close
log global
option tcplog
timeout server 1m
timeout connect 5s
server foo smtp.example.com:587 check

The problem is that if smtp.example.com becomes unreachable due to network problems, it is marked as down/failed by haproxy and never comes back up even when the server does, until haproxy is reloaded manually. Is there any way to never mark it as down? Thanks.


#2

Share the output of haproxy -vv and enable logging. Then share the logs (you should see backend server down event and usually you would see the server up event as well).

No idea why this happens though, never heard of such an issue.


#3

Sorry I can’t run them at the moment, but it’s haproxy 1.7.5 on FreeBSD 10.3. Please see this post for why this might be an issue: https://serverfault.com/questions/666600/haproxy-does-not-recover-after-failed-check?answertab=votes#tab-top

“And once a backend is marked as down it doesn’t go back up (this is not documented, I came to this conclusion based on my experience).”


#4

Somehow no corresponding UP event was logged until haproxy was manually restarted a few hours later.

Apr 19 17:41:16 foo haproxy[76287]: Server bk_foo/foo is DOWN, reason: Layer4 connection problem, info: “Connection refused”, check duration: 14ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Apr 19 17:41:16 foo haproxy[76287]: backend bk_foo has no server available!

Apr 20 02:44:40 foo haproxy[21656]: Proxy foo started.
Apr 20 02:44:40 foo haproxy[21656]: Proxy bk_foo started.
Apr 20 02:44:41 foo haproxy[76287]: Stopping proxy foo in 0 ms.
Apr 20 02:44:41 foo haproxy[76287]: Stopping backend bk_foo in 0 ms.
Apr 20 02:44:41 foo haproxy[76287]: Proxy foo stopped (FE: 90305 conns, BE: 0 conns).
Apr 20 02:44:41 foo haproxy[76287]: Proxy bk_foo stopped (FE: 0 conns, BE: 90305 conns).

This was haproxy 1.7.5. I’ve just upgraded to 1.7.10 in case this was a bug or something.