Layer4 connection problem, info: “Connection refused at step 1 of tcp-check (connect)”

Hi,

Similar to #2134

Despite it is the same error and the same fix, i don’t use resolvers in my conf

# haproxy -v
HA-Proxy version 1.7.5-2 2017/05/17
backend bk_redis
  mode tcp
  option tcp-check
  #tcp-check connect     --> comment it solved the problem
  tcp-check send PING\r\n
  tcp-check expect string +PONG
  tcp-check send info\ replication\r\n
  tcp-check expect string role:master
  tcp-check send QUIT\r\n
  tcp-check expect string +OK
  server redis-1-debian-9 redis-1-debian-9:6379 check inter 1s
  server redis-2-debian-9 redis-2-debian-9:6379 check inter 1s
  server redis-3-debian-9 redis-3-debian-9:6379 check inter 1s

But when i read the doc, it clearly state that tcp-check connect is required

In a tcp-check ruleset a 'connect' is required, it is also mandatory to start
the ruleset with a 'connect' rule. Purpose is to ensure admin know what they
do.
  • Why tcp-check connect fail ? bug ?
  • Despite what the doc says, is it ok to remove tcp-check connect ? what is the implication of this choice ?
  • Since we don’t have the problem with the exact same conf with v1.5.18 (centos7 ), could we remove it on this version too to have the same config on 1.5.18 and 1.7.5 ?

Regards,

If you remove the command, it won’t apply the tcp-check you configured, falling back to a simple connect check.

Maybe in your 1.5.18 deployment you actually have a redis backend with master role and in your 1.7.5 deployment you haven’t?

I suggest you capture the health check traffic between your failing 1.7.5 deployment and your backends and check what the backends return.

I also suggest you look at the logs.

Thanks Lukastribus for your answer

I don’t get how the fact that my redis backend have a redis master or not influence how “tcp-check connect” failed, there is no relationship with the error returned by “tcp-check connect” and the fact i have a master or not, if it was the case the error will be not :

Layer4 connection problem, info: “Connection refused at step 1 of tcp-check (connect)”

but something like:

at step 5 of tcp-check (expect string 'role:master')

Here a concrete example with 2 proxies targeting the same hosts with the same config.

1 host under centos and another one on debian 9 (haproxy from package distrib)

$ docker exec -ti proxy-1-centos-7 haproxy -v
HA-Proxy version 1.5.18 2016/05/10
Copyright 2000-2016 Willy Tarreau <willy@haproxy.org>

$ docker exec -ti proxy-2-debian-9 haproxy -v
HA-Proxy version 1.7.5-2 2017/05/17
Copyright 2000-2017 Willy Tarreau <willy@haproxy.org>

same config on both:

$ docker exec -ti proxy-1-centos-7 cat /etc/haproxy/haproxy.cfg
global
  log /dev/log local2
  stats socket /var/lib/haproxy/stats.sock mode 660 level admin process 1
  stats bind-process 1
  stats timeout 30s
  chroot /var/lib/haproxy/
  user haproxy
  group haproxy
  daemon
  maxconn 200000
  nbproc 4

defaults
  log global
  mode tcp
  option  tcplog
  timeout connect 5000
  timeout client 50000
  timeout server 50000

frontend ft_redis
  bind 172.28.5.253:6379 name redis
  default_backend bk_redis
  mode tcp

backend bk_redis
  mode tcp
  option tcp-check
  tcp-check connect
  tcp-check send PING\r\n
  tcp-check expect string +PONG
  tcp-check send info\ replication\r\n
  tcp-check expect string role:master
  tcp-check send QUIT\r\n
  tcp-check expect string +OK
  server redis-1-centos-7 redis-1-centos-7:6379 check inter 1s
  server redis-2-centos-7 redis-2-centos-7:6379 check inter 1s
  server redis-3-centos-7 redis-3-centos-7:6379 check inter 1s

$ diff <(docker exec proxy-1-centos-7 cat /etc/haproxy/haproxy.cfg) <(docker exec proxy-2-debian-9 cat /etc/haproxy/haproxy.cfg)
$

But haproxy failed on the debian version and not on centos

Centos:

$ docker exec -ti proxy-1-centos-7 bash -c 'echo "show stat" | socat /var/lib/haproxy/stats.sock stdio' 
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,
ft_redis,FRONTEND,,,0,0,2000,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,0,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,
bk_redis,redis-1-centos-7,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,128,0,,1,3,1,,0,,2,0,,0,L7OK,0,1,,,,,,,0,,,,0,0,,,,,-1,(tcp-check),,0,0,0,0,
bk_redis,redis-2-centos-7,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,126,126,,1,3,2,,0,,2,0,,0,L7TOUT,,1001,,,,,,,0,,,,0,0,,,,,-1, at step 5 of tcp-check (expect string 'role:master'),,0,0,0,0,
bk_redis,redis-3-centos-7,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,126,126,,1,3,3,,0,,2,0,,0,L7TOUT,,1001,,,,,,,0,,,,0,0,,,,,-1, at step 5 of tcp-check (expect string 'role:master'),,0,0,0,0,
bk_redis,BACKEND,0,0,0,0,200,0,0,0,0,0,,0,0,0,0,UP,1,1,0,,0,128,0,,1,3,0,,0,,1,0,,0,,,,,,,,,,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,

Debian:

$ docker exec -ti proxy-2-debian-9 bash -c 'echo "show stat" | socat /var/lib/haproxy/stats.sock stdio' 
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,
ft_redis,FRONTEND,,,0,0,2000,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,0,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,tcp,,0,0,0,,0,0,
bk_redis,redis-1-centos-7,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,110,110,,1,3,1,,0,,2,0,,0,L4CON,,0,,,,,,,,,,,0,0,,,,,-1,Connection refused at step 1 of tcp-check (connect),,0,0,0,0,,,,Layer4 connection problem,,2,3,0,,,,172.28.5.0:6379,,tcp,,,,,,,,
bk_redis,redis-2-centos-7,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,109,109,,1,3,2,,0,,2,0,,0,L4CON,,0,,,,,,,,,,,0,0,,,,,-1,Connection refused at step 1 of tcp-check (connect),,0,0,0,0,,,,Layer4 connection problem,,2,3,0,,,,172.28.5.1:6379,,tcp,,,,,,,,
bk_redis,redis-3-centos-7,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,109,109,,1,3,3,,0,,2,0,,0,L4CON,,0,,,,,,,,,,,0,0,,,,,-1,Connection refused at step 1 of tcp-check (connect),,0,0,0,0,,,,Layer4 connection problem,,2,3,0,,,,172.28.5.2:6379,,tcp,,,,,,,,
bk_redis,BACKEND,0,0,0,0,200,0,0,0,0,0,,0,0,0,0,DOWN,0,0,0,,1,109,109,,1,3,0,,0,,1,0,,0,,,,,,,,,,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,,,,,,,,,,,,,,tcp,roundrobin,,,,,,,

In theory, because if I remove tcp-check connect, the check seems to behave as it needs to

$ docker exec -ti proxy-2-debian-9 cat /etc/haproxy/haproxy.cfg
global
  log /dev/log local2
  stats socket /var/lib/haproxy/stats.sock mode 660 level admin process 1
  stats bind-process 1
  stats timeout 30s
  chroot /var/lib/haproxy/
  user haproxy
  group haproxy
  daemon
  maxconn 200000
  nbproc 4

defaults
  log global
  mode tcp
  option  tcplog
  timeout connect 5000
  timeout client 50000
  timeout server 50000

frontend ft_redis
  bind 172.28.5.253:6379 name redis
  default_backend bk_redis
  mode tcp

backend bk_redis
  mode tcp
  option tcp-check
  tcp-check send PING\r\n
  tcp-check expect string +PONG
  tcp-check send info\ replication\r\n
  tcp-check expect string role:master
  tcp-check send QUIT\r\n
  tcp-check expect string +OK
  server redis-1-centos-7 redis-1-centos-7:6379 check inter 1s
  server redis-2-centos-7 redis-2-centos-7:6379 check inter 1s
  server redis-3-centos-7 redis-3-centos-7:6379 check inter 1s


$ docker exec -ti proxy-2-debian-9 bash -c 'echo "show stat" | socat /var/lib/haproxy/stats.sock stdio' 
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,
ft_redis,FRONTEND,,,0,0,2000,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,0,,,,,,,,,,,0,0,0,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,tcp,,0,0,0,,0,0,
bk_redis,redis-1-centos-7,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,1,0,0,0,18,0,,1,3,1,,0,,2,0,,0,L7OK,0,2,,,,,,,,,,,0,0,,,,,-1,(tcp-check),,0,0,0,0,,,,Layer7 check passed,,2,3,4,,,,172.28.5.0:6379,,tcp,,,,,,,,
bk_redis,redis-2-centos-7,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,17,17,,1,3,2,,0,,2,0,,0,L7TOUT,,1002,,,,,,,,,,,0,0,,,,,-1, at step 4 of tcp-check (expect string 'role:master'),,0,0,0,0,,,,Layer7 timeout,,2,3,0,,,,172.28.5.1:6379,,tcp,,,,,,,,
bk_redis,redis-3-centos-7,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,17,17,,1,3,3,,0,,2,0,,0,L7TOUT,,1001,,,,,,,,,,,0,0,,,,,-1, at step 4 of tcp-check (expect string 'role:master'),,0,0,0,0,,,,Layer7 timeout,,2,3,0,,,,172.28.5.2:6379,,tcp,,,,,,,,
bk_redis,BACKEND,0,0,0,0,200,0,0,0,0,0,,0,0,0,0,UP,1,1,0,,0,18,0,,1,3,0,,0,,1,0,,0,,,,,,,,,,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,,,,,,,,,,,,,,tcp,roundrobin,,,,,,,

You’re right, this would lead to a different error.

Still, to troubleshoot the issue we are gonna need a tcpdump capture (tcpdump -nps0 -i <ethinterface> -w healthcheckfail.cap tcp port 6379) of that health check traffic. We may also need take a loot at a strace (starting haproxy with strace -tt haproxy -f /etc/haproxy/haproxy.cfg of the health check).

Please find the haproxy.strace captured with:

strace -o haproxy.strace -tt haproxy -f /etc/haproxy/haproxy.cfg

Debian:
https://framadrop.org/r/eeKpTUr6JX#O5bvyN9beWlUr3mq2m/VaXQg344ZgK5g23Refw//bw8=
Centos:
https://framadrop.org/r/h8LdrQBghh#jqIaODsnWTrhNr6EgxfdHo3e5Mr7x2IFL/27TQGKyM0=

And ~1min of tcpdump of the corresponding interface for the docker debian container captured with:

#debian container
tcpdump -nps0 -i vethb9cf104 -w healthcheckfail.cap tcp port 6379

Debian:
https://framadrop.org/r/NCiOVC0NCD#hwhsKDYBGk44YLNx+k0Y71rN1PfPxomfDB8H7OHFOzU=
Centos:
https://framadrop.org/r/ZmbYSSEwDo#kakSe+ueAdJGlTen3YkhBBlmkASuIOy2cACzV2CRlsI=

Hope i’ve done it right

The strace does not really contain the health checks, but anyway from your tcpdump capture we can see than every single requests from the debian instance is rejected by 172.28.5.0.

On the other hand, from the CentOS tcpdump we can see that it is talking to 172.28.5.1 instead, and it does get responses.

Is it expected that the debian host talks to a different backend IP than the CentOS host? Can you clarify what those IP’s are?

I don’t really understand how this could be influenced by the tcp-check connect configuration, to be honest. Could you capture the health check traffic on debian again with tcp-check connect commented out?

@lukastribus, sorry, but it seems there is some mistake in my previous test

it is an error resulting from a modification of the configuration of my containers who involved a restart of the docker daemon and the containers, so the IP addresses was not the same between my two test. sorry…

I redo my tests from the start, and the tcpdump on debian host show nothing when ‘tcp-check connect’ is activated despite what i provided previously to you.

So i decided to redo the test with a little script.
You will find the log of the execution, the script itself, and the tcpdump .cap files here :
https://github.com/rockandska/repro-issues/tree/master/haproxy/haproxy/issue_3676

I also tried to apply the same config on a VM (virtualbox) to be sure that is not related to some docker behaviour and hit exactly the same issue so it seems not related to haproxy running inside a container.

Additionally, you could ping me on #haproxy IRC channel if you want to exchange directly

Thanks,

Best regards,

I cannot reproduce any of it. I used Debian 9 with the haproxy build it ships with, assigned 172.28.5.5/16 to it, and spawned a TCP server on adjacent 172.28.5.[0-2]/16 VM’s, and it continues to connect and health check all 3 VM’s, failing at later stages:

lukas@debian:~$ sudo systemctl status haproxy
● haproxy.service - HAProxy Load Balancer
   Loaded: loaded (/lib/systemd/system/haproxy.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-04-02 20:33:01 UTC; 9s ago
     Docs: man:haproxy(1)
           file:/usr/share/doc/haproxy/configuration.txt.gz
  Process: 3928 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q $EXTRAOPTS (code=exited, status=0/SUCCESS)
 Main PID: 3931 (haproxy-systemd)
    Tasks: 3 (limit: 4915)
   CGroup: /system.slice/haproxy.service
           ├─3931 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
           ├─3932 /usr/sbin/haproxy-master
           └─3933 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds

Apr 02 20:33:01 debian systemd[1]: Starting HAProxy Load Balancer...
Apr 02 20:33:01 debian systemd[1]: Started HAProxy Load Balancer.
Apr 02 20:33:01 debian haproxy-systemd-wrapper[3931]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
Apr 02 20:33:01 debian haproxy[3932]: Proxy ft_redis started.
Apr 02 20:33:01 debian haproxy[3932]: Proxy bk_redis started.
Apr 02 20:33:01 debian haproxy[3933]: Server bk_redis/redis-1-centos-7 is DOWN, reason: Layer7 invalid response, info: "TCPCHK got an empty response at step 3", check duration: 2ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Apr 02 20:33:01 debian haproxy[3933]: Server bk_redis/redis-2-centos-7 is DOWN, reason: Layer7 invalid response, info: "TCPCHK got an empty response at step 3", check duration: 1ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Apr 02 20:33:01 debian haproxy[3933]: Server bk_redis/redis-3-centos-7 is DOWN, reason: Layer7 invalid response, info: "TCPCHK got an empty response at step 3", check duration: 1ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Apr 02 20:33:01 debian haproxy[3933]: backend bk_redis has no server available!
lukas@debian:~$

The only thing I changed is I replaced the servers with the corresponding IP:

  server redis-1-centos-7 172.28.5.0:6379 check inter 1s
  server redis-2-centos-7 172.28.5.1:6379 check inter 1s
  server redis-3-centos-7 172.28.5.2:6379 check inter 1s

How are those hostnames resolved in your case?
Please run the correct, non-working debian setup through strace.

They resolved fine inside the container and use docker_network.
This output was generated by this part of my test script and use hostnames inside the containers.
In the test i’ve made with a debian VM, i add the hosts to /etc/hosts

I will post the strace of the debian who fail tomorrow.

Thanks for your help

Regards,

@lukastribus, i just update all my tests and add :

  • an output of the fqdn resolved inside containers
  • an output of the /etc/hosts
  • an output of the strace for each case
  • a test with ips instead of fqdn

An you’ll see that the problem is indeed when i use the FQDN on debian, if I use IP’s the tcp-check connect works the same as on the centos container.

So the problem seems to be related to the resolve of the hostsnames

I’ve take a quick look at the strace and seen nothing special between the tests (but i’m not an expert in strace analysis)

Edit: what is weird, is that if i put erroneous fqdn, haproxy throw an error directly, so it seems that haproxy is able to resolve my redis-container but something occur when the tcp-check connect happen

root@proxy-2-debian-9:/# haproxy -f /etc/haproxy/haproxy.cfg 
[ALERT] 092/132710 (1905) : parsing [/etc/haproxy/haproxy.cfg:33] : 'server redis-1-centos-7' : could not resolve address 'bob'.
[ALERT] 092/132710 (1905) : parsing [/etc/haproxy/haproxy.cfg:34] : 'server redis-2-centos-7' : could not resolve address 'bob2'.
[ALERT] 092/132710 (1905) : parsing [/etc/haproxy/haproxy.cfg:35] : 'server redis-3-centos-7' : could not resolve address 'bob3'.
[ALERT] 092/132710 (1905) : Failed to initialize server(s) addr.

Regards,

I can reproduce it now.

Apparently with tcp-check connect we hit a bug here where haproxy does not know the destination port. Because instead of connecting to port 6379 it actually connects to port 0, the destination refuses the connection.

2 Workarounds:

  • specify the port in the connect command: tcp-check connect port 6379 instead of just tcp-check connect
  • or specify the check port again for each server server redis-1-centos-7 redis-1-centos-7:6379 check port 6379 inter 1s

I believe this is fixed in BUG/MINOR: tcp-check: use the server’s service port as a
fallback
, which is in release 1.7.11, but Debian does not carry this release.

I suggest you install an uptodate haproxy release from Vincent Bernat (he is also an official Debian maintainer):

https://haproxy.debian.net/

The 1.7 branch is fine, but you may also pick 1.8 for more features. Both will have the fix above.

The commit that fixes this issue does not research the root cause of the issue; I assume this is a regression since this clearly worked in the older releases and it also was documented this way. I will dig into this, and maybe file a open+shut bug on github to track it.

edit: the bug filed is:

Hi @lukastribus,

Thanks for your feedback and “glad” to here that is/was a bug and not my configuration/infrastructure ^^

The fix who need to specifically set the port for the tch-check connect is an easy one to add and happy that is not unresolvable :slight_smile:

Thanks for your time and investigation

Best regards,