Unexpected behaviour disable-on-404

We are using HAProxy for a while now and are using a new option: http-check disable-on-404.

For a few days this seemed to be working properly, until servers are ‘switching’.
As you can see, server002 was OK, stating ‘conditionally succeeded’, returning status 404.
Until server002 aws switching to UP temporary, returning status 200. Then it failed, first with “Connection refused”. This of course should result in a failed state. But when the server recovered and returned status 404 again, it was still in the failed state. The only way to get it back as ‘conditionally succeeded’, was reloading HAProxy. I believe this is a bug.

Below the configuration:

backend backend
mode http
balance roundrobin
option allbackups

http-reuse  always

# health checks
option httpchk GET /isActive
http-check disable-on-404
http-check expect status 200
default-server slowstart 30s check inter 10s fall 3 rise 3

server server001 10.10.0.1:8080 weight 100
server server002 10.10.0.2:8080 weight 100

Below the log output:

Jan 29 12:57:22 loadbalancer haproxy[18143]: Health check for server backend/server001 succeeded, reason: Layer7 check passed, code: 200, info: "HTTP status check returned code <3C>200<3E>", check duration: 1ms, status: 3/3 UP.
Jan 29 12:57:22 loadbalancer haproxy[18143]: Health check for server backend/server002 conditionally succeeded, reason: Layer7 check conditionally passed, code: 404, info: "Not Found", check duration: 2ms, status: 3/3 UP.
Jan 29 12:57:22 loadbalancer haproxy[18143]: Server backend/server002 is stopping. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jan 29 13:21:02 loadbalancer haproxy[18143]: Health check for server backend/server001 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 2/3 UP.
Jan 29 13:21:02 loadbalancer haproxy[18143]: Health check for server backend/server002 succeeded, reason: Layer7 check passed, code: 200, info: "HTTP status check returned code <3C>200<3E>", check duration: 1ms, status: 3/3 UP.
Jan 29 13:21:02 loadbalancer haproxy[18143]: Server backend/server002 is UP. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jan 29 13:21:12 loadbalancer haproxy[18143]: Health check for server backend/server001 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 1/3 UP.
Jan 29 13:21:22 loadbalancer haproxy[18143]: Health check for server backend/server001 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 0/3 DOWN.
Jan 29 13:21:22 loadbalancer haproxy[18143]: Server backend/server001 is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jan 29 13:21:32 loadbalancer haproxy[18143]: Server backend/server002 is UP. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jan 29 13:22:59 loadbalancer haproxy[18143]: Health check for server backend/server001 failed, reason: Layer7 wrong status, code: 404, info: "HTTP status check returned code <3C>404<3E>", check duration: 4ms, status: 0/3 DOWN.
Jan 29 13:23:29 loadbalancer haproxy[18143]: Health check for server backend/server001 succeeded, reason: Layer7 check passed, code: 200, info: "HTTP status check returned code <3C>200<3E>", check duration: 2ms, status: 1/3 DOWN.
Jan 29 13:23:32 loadbalancer haproxy[18143]: Health check for server backend/server002 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 2/3 UP.
Jan 29 13:23:39 loadbalancer haproxy[18143]: Health check for server backend/server001 succeeded, reason: Layer7 check passed, code: 200, info: "HTTP status check returned code <3C>200<3E>", check duration: 2ms, status: 2/3 DOWN.
Jan 29 13:23:42 loadbalancer haproxy[18143]: Health check for server backend/server002 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 1/3 UP.
Jan 29 13:23:49 loadbalancer haproxy[18143]: Health check for server backend/server001 succeeded, reason: Layer7 check passed, code: 200, info: "HTTP status check returned code <3C>200<3E>", check duration: 3ms, status: 3/3 UP.
Jan 29 13:23:49 loadbalancer haproxy[18143]: Server backend/server001 is UP. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jan 29 13:23:52 loadbalancer haproxy[18143]: Health check for server backend/server002 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 0/3 DOWN.
Jan 29 13:23:52 loadbalancer haproxy[18143]: Server backend/server002 is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jan 29 13:24:19 loadbalancer haproxy[18143]: Server backend/server001 is UP. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jan 29 13:25:12 loadbalancer haproxy[18143]: Health check for server backend/server002 failed, reason: Layer4 timeout, check duration: 10000ms, status: 0/3 DOWN.
Jan 29 13:25:22 loadbalancer haproxy[18143]: Health check for server backend/server002 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms, status: 0/3 DOWN.
Jan 29 13:25:32 loadbalancer haproxy[18143]: Health check for server backend/server002 failed, reason: Layer7 wrong status, code: 404, info: "HTTP status check returned code <3C>404<3E>", check duration: 4ms, status: 0/3 DOWN.

Please provide the output of haproxy -vv.

HA-Proxy version 1.8.17-1ppa1~xenial 2019/01/15
Copyright 2000-2019 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_NS=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE2 version : 10.21 2016-01-12
PCRE2 library supports JIT : yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with network namespace support.

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

That’s exactly what is supposed to happen as per the documentation:

https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#4.2-http-check%20disable-on-404

If the server responds 2xx or 3xx again, it
will immediately be reinserted into the farm.

Hi lukastribus,

Thank you for your answer. However I have my doubts about it. The problem is not its being inserted into the farm. The problem is that its not being put back at maintenance mode.

Situation:
2 servers, a master / slave setup, named, A and B.

A = UP (returning 200)
B = NOLB (returning 404)

All OK so far.

Master / Slave have a failover, and B becomes master.
B returns 200 and becomes UP
A fails with connection timeout and becomes DOWN

All OK so far.

Master / Slave return to their original state, A becomes master again.
A returns 200 and becomes UP <- Correct!
B returns 404 and becomes DOWN <- Wrong! Should be Maintenance (NOLB)

Please explain me if I see it wrong, but if you ask me this is not correct. Once a server has been DOWN and becomes UP again, the disable-on-404 option is not being applied and this is not documented if you ask me.

No. If the server is down and does not respond with 200, it will remain down.

There is now way to pass from Down to Maintenance mode like this. disable-on-404 is to pass from UP to maintenance mode - nothing more.