Hi to all, I have a problem with a haproxy instance (1.9.4) in front of a redis cluster (3 nodes), all inside k8s.
I configured haproxy for a tcp-check like this:
backend bk_redis
option tcp-check
tcp-check send AUTH\ RedisTest\r\n
tcp-check expect string +OK
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
default-server check resolvers kubedns inter 1s downinter 1s fastinter 1s fall 1 rise 30 maxconn 330 no-agent-check on-error mark-down
server redis-0 redis-ha-server-0.redis-ha.redis-ha.svc.cluster.local:6379
server redis-1 redis-ha-server-1.redis-ha.redis-ha.svc.cluster.local:6379
server redis-2 redis-ha-server-2.redis-ha.redis-ha.svc.cluster.local:6379
When the master node goes down it works fine, a replica is promoted to master and haproxy redirects the traffic to that.
The problem is when the old master comes back with a new ip, because haproxy doesn’t check again for the master role but instead it puts immediately the old node as UP.
this is the log:
[NOTICE] 058/125637 (1) : New worker #1 (6) forked
[WARNING] 058/125637 (6) : Health check for server bk_redis/redis-0 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 0ms, status: 1/1 UP.
[WARNING] 058/125639 (6) : Health check for server bk_redis/redis-1 failed, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1001ms, status: 0/30 DOWN.
[WARNING] 058/125639 (6) : Server bk_redis/redis-1 is DOWN. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 058/125639 (6) : Health check for server bk_redis/redis-2 failed, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1001ms, status: 0/30 DOWN.
[WARNING] 058/125639 (6) : Server bk_redis/redis-2 is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 058/125657 (6) : Health check for server bk_redis/redis-0 failed, reason: Layer4 timeout, info: " at step 1 of tcp-check (send)", check duration: 1001ms, status: 0/30 DOWN.
[WARNING] 058/125657 (6) : Server bk_redis/redis-0 is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 058/125657 (6) : backend 'bk_redis' has no server available!
[WARNING] 058/125706 (6) : Health check for server bk_redis/redis-2 failed, reason: Layer7 invalid response, info: "TCPCHK did not match content 'role:master' at step 6", check duration: 532ms, status: 0/30 DOWN.
[WARNING] 058/125706 (6) : Health check for server bk_redis/redis-1 failed, reason: Layer7 invalid response, info: "TCPCHK did not match content 'role:master' at step 6", check duration: 835ms, status: 0/30 DOWN.
[WARNING] 058/125707 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 1/30 DOWN.
[WARNING] 058/125708 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 2/30 DOWN.
[WARNING] 058/125708 (6) : Health check for server bk_redis/redis-1 failed, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1001ms, status: 0/30 DOWN.
[WARNING] 058/125709 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 4ms, status: 3/30 DOWN.
[WARNING] 058/125710 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 4ms, status: 4/30 DOWN.
[WARNING] 058/125711 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 5/30 DOWN.
[WARNING] 058/125712 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 6/30 DOWN.
[WARNING] 058/125713 (6) : Server bk_redis/redis-0 was DOWN and now enters maintenance (DNS NX status).
[WARNING] 058/125713 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 7/30 DOWN.
[WARNING] 058/125714 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 8/30 DOWN.
[WARNING] 058/125715 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 9/30 DOWN.
[WARNING] 058/125716 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 4ms, status: 10/30 DOWN.
[WARNING] 058/125717 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 11/30 DOWN.
[WARNING] 058/125718 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 4ms, status: 12/30 DOWN.
[WARNING] 058/125719 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 13/30 DOWN.
[WARNING] 058/125720 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 14/30 DOWN.
[WARNING] 058/125721 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 15/30 DOWN.
[WARNING] 058/125722 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 16/30 DOWN.
[WARNING] 058/125723 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 17/30 DOWN.
[WARNING] 058/125724 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 4ms, status: 18/30 DOWN.
[WARNING] 058/125725 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 19/30 DOWN.
[WARNING] 058/125726 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 20/30 DOWN.
[WARNING] 058/125727 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 21/30 DOWN.
[WARNING] 058/125728 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 22/30 DOWN.
[WARNING] 058/125729 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 23/30 DOWN.
[WARNING] 058/125730 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 4ms, status: 24/30 DOWN.
[WARNING] 058/125731 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 4ms, status: 25/30 DOWN.
[WARNING] 058/125732 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 26/30 DOWN.
[WARNING] 058/125733 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 3ms, status: 27/30 DOWN.
[WARNING] 058/125734 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 28/30 DOWN.
[WARNING] 058/125735 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 2ms, status: 29/30 DOWN.
[WARNING] 058/125736 (6) : Health check for server bk_redis/redis-2 succeeded, reason: Layer7 check passed, code: 0, info: "(tcp-check)", check duration: 1ms, status: 1/1 UP.
[WARNING] 058/125736 (6) : Server bk_redis/redis-2 is UP. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
[WARNING] 058/125945 (6) : bk_redis/redis-0 changed its IP from 10.42.4.85 to 10.42.4.87 by kubedns/namesrv1.
[WARNING] 058/125945 (6) : Server bk_redis/redis-0 ('redis-ha-server-0.redis-ha.redis-ha.svc.cluster.local') is UP/READY (resolves again).
[WARNING] 058/125945 (6) : Server bk_redis/redis-0 administratively READY thanks to valid DNS answer.
[WARNING] 058/125947 (6) : Health check for server bk_redis/redis-0 failed, reason: Layer7 timeout, info: " at step 6 of tcp-check (expect string 'role:master')", check duration: 1000ms, status: 0/30 DOWN.
[WARNING] 058/125947 (6) : Server bk_redis/redis-0 is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
If you see last lines, when the bk_redis/redis-0 has a new ip (BUT IT WAS DOWN) it goes immediately UP without do the tcp-check (that it start after a second and of course it fails).
How can I avoid this ?
Is there a way to force that when it resolves again the ip it waits for the tcp-check for go UP ?