Prevent failback with active/active servers

mickaelh51 · November 24, 2022, 10:56am

Hi all,
I’m looking for a way to prevent failback when 1 node in my backend goes down.
I have this backend conf with my SSO instances (keycloak)

backend auth_identity
  acl acl_auth_identity_pages path_beg /js/ /realms/ /resources/ /robots.txt
  acl acl_auth_identity_master_realm path_beg /realms/master
  acl acl_auth_identity_check_pages path_reg -i ^\/realms\/.*\/health\/check.*
  acl acl_sso_staging_metrics_pages path_reg -i ^\/realms\/.*\/metrics

  http-request deny if acl_auth_identity_master_realm
  http-request deny if acl_auth_identity_check_pages
  http-request deny if acl_sso_staging_metrics_pages
  http-request allow if acl_auth_identity_pages
  http-request deny

  http-request add-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Port %[dst_port]
  http-request set-header X-Real-IP %[src]
  http-request set-header X-Forwarded-Host %[req.hdr(Host)]
  http-response set-header X-XSS-Protection "1; mode=block"
  mode http
  option forwardfor except 127.0.0.1
  balance roundrobin
  cookie AUTH_SESSION_ID prefix nocache
  server auth-1 auth-1:8080 check maxconn 32 cookie auth-1
  server auth-2 auth-2:8080 check maxconn 32 cookie auth-2

For example,
auth-1 and auth-2 are on loadbalancing roundrobin
when auth-1 goes down, all traffic is send to auth-2.
But when auth-1 goes back alive, I don’t want send traffic to it. Only a manual action can reactive it.
is it possible to do that please ?

I use HA-Proxy version 2.2.9-2+deb11u3

thanks in advance

AaronWest · November 24, 2022, 12:07pm

I’ve done Active/Passive with a manual action to failback… I used a Stick Table with “stick on dst”. This way all traffic will use one server unless a failure occurs and only then connections will move to the next server. However, on the first server’s return, all traffic stays on the second server unless you clear the stick table allowing it to rebalance. For an extra guarantee and the ability to specify server order use the “first” scheduler also.

But it sounds like you want them to be used equally until a failure occurs… I still think stick table is probably the way to go because once written it will continue to use the server last used unless it fails or the table is clear. But using something like “stick on dst” will result in one server being used all of the time… I wonder if you could use an external health check and whenever it passes having just failed it sets the weight to zero… Messy and prone to issues though…

Not sure if I have an answer really but I will think on it and come back if I can.

mickaelh51 · November 24, 2022, 1:13pm

Hi @AaronWest
thanks a lot for your answer.
I already used stick to do no failback for mysql.

stick-table type integer size 1 expire 1d
  stick on int(1)
  server am-sql-001 10.1.12.19:3306 check on-marked-down shutdown-sessions
  server am-sql-002 10.2.12.19:3306 check backup on-marked-down shutdown-sessions

Do you have an example to make an external health check and passes weight to 0 ?
I have no idea to passes weight to 0…

PS: maybe I can use options mark-down with on-error… I have to check

thanks in advance

AaronWest · November 24, 2022, 1:48pm

I was thinking that we use the stats socket and pass the following in our external script before returning a code to HAProxy and failing the server:

echo "set weight backend/server weight" | socat stdio /path/to/haproxy.sock

A simple bash script would do although we could use most languages.

mickaelh51 · November 24, 2022, 3:39pm

I developed a quick script to switch down server to maintenance mode

#!/bin/bash
# use it: 
# DEV=1 HAPROXY_PROXY_NAME=keycloak HAPROXY_SERVER_NAME=keycloak1 HAPROXY_SERVER_ADDR=10.1.11.31 ./on_error_server_down.sh
RESP=$( \
    /usr/bin/curl \
    -s \
    -o /dev/null \
    -w "%{http_code}" \
    --connect-timeout 2 \
    --retry 3 \
    --retry-delay 2 \
    http://${HAPROXY_SERVER_ADDR}:8080/realms/master/health/check)
socket="/tmp/api.sock"
#echo $RESP
# doc of variables here: http://cbonte.github.io/haproxy-dconv/2.2/configuration.html#4.2-external-check%20command
if [[ $RESP -ne 200 ]]; then
    if [[ ${DEV} ]]; then
        /bin/echo "set server ${HAPROXY_PROXY_NAME}/${HAPROXY_SERVER_NAME} state maint"
        exit 0
    else
        /bin/echo "set server ${HAPROXY_PROXY_NAME}/${HAPROXY_SERVER_NAME} state maint" | /usr/bin/socat stdio $socket
        exit 1
    fi
else
    /bin/echo "Check OK"
    exit 0
fi

haproxy conf:

backend keycloak
  mode http
  option external-check
  external-check path "/usr/bin:/bin:/usr/local/bin"
  external-check command /usr/local/bin/on_error_server_down.sh
  ....
  server		auth-1 auth-1:8080 check
  server		auth-2 auth-2:8080 check

it works correctly ! thanks to that, there is no risk to failback to new UP server.
thanks for your help

Topic		Replies	Views
Prevent automatic failback on active/passive backend not working Help!	1	1979	February 7, 2019
Calling an url if the active server is down Help!	3	421	October 28, 2019
Reserve backend server Help!	3	836	September 17, 2019
Stops checking health Help!	3	3283	July 17, 2020
Simple two server failover Configuration Samples	3	1227	March 6, 2024

Prevent failback with active/active servers

Related topics