Check to backend tcp server marked wrongfully as down

Hey all,

Seeing a weird issue with HAProxy in TCP Mode.

I have the following configuration:

global
  description haproxy-1
  log 127.0.0.1 local0 info
  maxconn 2000
  uid 200
  gid 200
  chroot /var/empty
  daemon

defaults
  mode tcp
  log global
  option tcplog
  option logasap
  option log-health-checks
  option redispatch
  option tcpka
  retries 3
  timeout connect 5s
  timeout client 50000
  timeout server 50000

listen TM_IN
  bind :7101
  timeout client 3h
  timeout server 3h
  option tcp-check
  balance roundrobin
  server prime_tm server1:7101 check inter 1000
  server backup_tm server2:7101 check inter 1000 backup

listen TC_OUT
  bind :7201
  timeout client 3h
  timeout server 3h
  option tcp-check
  balance roundrobin
  server prime_tc server1:7201 check inter 1000
  server backup_tc server2:7201 check inter 1000 backup

listen stats
  bind :80
  mode http
  option httplog
  option contstats
  stats enable
  stats realm Haproxy\ Statistics
  stats hide-version
  stats uri /stats
  stats auth user:pass

With both servers up and listening:
On the status page the TC_OUT prime is getting marked as ‘L4CON in 0ms’ with ‘Connection Refused’ yet data is still flowing to that server. The TC_OUT backup is marked as ‘OK’

When I turn off TC_OUT prime listening on 7201 it switches the traffic to backup but the log shows that the health checks now start failing on the backup.

I am wondering why that is happening on only the TC_OUT proxy and not TM_IN. They are logically the same yet TM_IN’s prime and backup are both shown as ‘OK’ and ‘UP’ when both servers are up and correctly marked as down if either goes down.

Thanks!

Please provide those exact logs that you see and the output of haproxy -vv.

Results for haproxy -vv:
HA-Proxy version 1.6.11 2016/12/25
Copyright 2000-2016 Willy Tarreau willy@haproxy.org

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.1t  3 May 2016
Running on OpenSSL version : OpenSSL 1.0.1t  3 May 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.35 2014-04-04
Running on PCRE version : 8.35 2014-04-04
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Logs:

Jan 13 15:35:30 localhost haproxy[32]: Proxy TM_IN started.
Jan 13 15:35:30 localhost haproxy[32]: Proxy TC_OUT started.
Jan 13 15:35:30 localhost haproxy[32]: Proxy stats started.
Jan 13 15:35:30 localhost haproxy[33]: Health check for server TM_IN/primeTM succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
Jan 13 15:35:30 localhost haproxy[33]: Health check for backup server TM_IN/backupTM succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
Jan 13 15:35:31 localhost haproxy[33]: Health check for server TC_OUT/primeTC succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
Jan 13 15:35:31 localhost haproxy[33]: Health check for backup server TC_OUT/backupTC succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
Jan 13 15:35:34 localhost haproxy[33]: client1:38142 [13/Jan/2017:15:35:34.177] TM_IN TM_IN/primeTM 1/0/+0 +0 -- 1/1/1/1/0 0/0
Jan 13 15:35:34 localhost haproxy[33]: client1:47346 [13/Jan/2017:15:35:34.192] TC_OUT TC_OUT/primeTC 1/0/+0 +0 -- 2/1/1/1/0 0/0
Jan 13 15:35:34 localhost haproxy[33]: somehost:34534 [13/Jan/2017:15:35:34.706] stats stats/<STATS> 0/0/0/0/+0 200 +95 - - LR-- 3/1/0/0/0 0/0 "GET /stats HTTP/1.1"
Jan 13 15:35:35 localhost haproxy[33]: Health check for server TC_OUT/primeTC failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 2/3 UP.
Jan 13 15:35:36 localhost haproxy[33]: Health check for server TC_OUT/primeTC failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 1/3 UP.
Jan 13 15:35:37 localhost haproxy[33]: Health check for server TC_OUT/primeTC failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 0/2 DOWN.
Jan 13 15:35:37 localhost haproxy[33]: Server TC_OUT/primeTC is DOWN. 0 active and 1 backup servers left. Running on backup. 1 sessions active, 0 requeued, 0 remaining in queue.
Jan 13 15:36:38 localhost haproxy[33]: somehost:34538 [13/Jan/2017:15:36:38.474] stats stats/<STATS> 0/0/0/0/+0 200 +95 - - LR-- 3/1/0/0/0 0/0 "GET /stats HTTP/1.1"

Interesting to note that although it reports primeTC as DOWN and Running on backup it actually still sends the data to prime until the connection actually goes down.

Thanks.

Edit: also not sure why I am not seeing a tcplog entry in the log if its enabled in the defaults section and I am sending data across.

Ok figured out the issue. Seems like server side only accepts one connection for TC. I have to run the check against another port.