Check to backend tcp server marked wrongfully as down


#1

Hey all,

Seeing a weird issue with HAProxy in TCP Mode.

I have the following configuration:

global
  description haproxy-1
  log 127.0.0.1 local0 info
  maxconn 2000
  uid 200
  gid 200
  chroot /var/empty
  daemon

defaults
  mode tcp
  log global
  option tcplog
  option logasap
  option log-health-checks
  option redispatch
  option tcpka
  retries 3
  timeout connect 5s
  timeout client 50000
  timeout server 50000

listen TM_IN
  bind :7101
  timeout client 3h
  timeout server 3h
  option tcp-check
  balance roundrobin
  server prime_tm server1:7101 check inter 1000
  server backup_tm server2:7101 check inter 1000 backup

listen TC_OUT
  bind :7201
  timeout client 3h
  timeout server 3h
  option tcp-check
  balance roundrobin
  server prime_tc server1:7201 check inter 1000
  server backup_tc server2:7201 check inter 1000 backup

listen stats
  bind :80
  mode http
  option httplog
  option contstats
  stats enable
  stats realm Haproxy\ Statistics
  stats hide-version
  stats uri /stats
  stats auth user:pass

With both servers up and listening:
On the status page the TC_OUT prime is getting marked as ‘L4CON in 0ms’ with ‘Connection Refused’ yet data is still flowing to that server. The TC_OUT backup is marked as ‘OK’

When I turn off TC_OUT prime listening on 7201 it switches the traffic to backup but the log shows that the health checks now start failing on the backup.

I am wondering why that is happening on only the TC_OUT proxy and not TM_IN. They are logically the same yet TM_IN’s prime and backup are both shown as ‘OK’ and ‘UP’ when both servers are up and correctly marked as down if either goes down.

Thanks!


#2

Please provide those exact logs that you see and the output of haproxy -vv.


#3

Results for haproxy -vv:
HA-Proxy version 1.6.11 2016/12/25
Copyright 2000-2016 Willy Tarreau willy@haproxy.org

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
  OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.1t  3 May 2016
Running on OpenSSL version : OpenSSL 1.0.1t  3 May 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.35 2014-04-04
Running on PCRE version : 8.35 2014-04-04
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Logs:

Jan 13 15:35:30 localhost haproxy[32]: Proxy TM_IN started.
Jan 13 15:35:30 localhost haproxy[32]: Proxy TC_OUT started.
Jan 13 15:35:30 localhost haproxy[32]: Proxy stats started.
Jan 13 15:35:30 localhost haproxy[33]: Health check for server TM_IN/primeTM succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
Jan 13 15:35:30 localhost haproxy[33]: Health check for backup server TM_IN/backupTM succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
Jan 13 15:35:31 localhost haproxy[33]: Health check for server TC_OUT/primeTC succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
Jan 13 15:35:31 localhost haproxy[33]: Health check for backup server TC_OUT/backupTC succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
Jan 13 15:35:34 localhost haproxy[33]: client1:38142 [13/Jan/2017:15:35:34.177] TM_IN TM_IN/primeTM 1/0/+0 +0 -- 1/1/1/1/0 0/0
Jan 13 15:35:34 localhost haproxy[33]: client1:47346 [13/Jan/2017:15:35:34.192] TC_OUT TC_OUT/primeTC 1/0/+0 +0 -- 2/1/1/1/0 0/0
Jan 13 15:35:34 localhost haproxy[33]: somehost:34534 [13/Jan/2017:15:35:34.706] stats stats/<STATS> 0/0/0/0/+0 200 +95 - - LR-- 3/1/0/0/0 0/0 "GET /stats HTTP/1.1"
Jan 13 15:35:35 localhost haproxy[33]: Health check for server TC_OUT/primeTC failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 2/3 UP.
Jan 13 15:35:36 localhost haproxy[33]: Health check for server TC_OUT/primeTC failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 1/3 UP.
Jan 13 15:35:37 localhost haproxy[33]: Health check for server TC_OUT/primeTC failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 0/2 DOWN.
Jan 13 15:35:37 localhost haproxy[33]: Server TC_OUT/primeTC is DOWN. 0 active and 1 backup servers left. Running on backup. 1 sessions active, 0 requeued, 0 remaining in queue.
Jan 13 15:36:38 localhost haproxy[33]: somehost:34538 [13/Jan/2017:15:36:38.474] stats stats/<STATS> 0/0/0/0/+0 200 +95 - - LR-- 3/1/0/0/0 0/0 "GET /stats HTTP/1.1"

Interesting to note that although it reports primeTC as DOWN and Running on backup it actually still sends the data to prime until the connection actually goes down.

Thanks.

Edit: also not sure why I am not seeing a tcplog entry in the log if its enabled in the defaults section and I am sending data across.


#4

Ok figured out the issue. Seems like server side only accepts one connection for TC. I have to run the check against another port.