HAProxy won't connect resolving with Kubernetes DNS HAProxy 1.8.4

mbaquer6 · February 15, 2018, 6:03pm

Hi there!
I’ve been trying to configure HAProxy to balance a Redis cluster asking who is the master and connecting to it. When I use IP addresses, all works fine, but Kubernetes is very dynamic and I need to set it with DNS.
I tried using Kubernetes Service Discovery with a segmentation fault from HAProxy in response, here is the config:

global
log /dev/log daemon

defaults REDIS
mode tcp
timeout connect 3s
timeout server 6s
timeout client 6s

resolvers kube
nameserver dns1 100.64.0.10:53
resolve_retries 3
timeout resolve 1s
timeout retry 1s
hold valid 10s
hold obsolete 30s

frontend ft_redis
bind *:6378 name redis
default_backend bk_redis

backend bk_redis
option tcp-check
tcp-check connect
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK

server-template redis-cluster- 0-5 _redis-cluster._TCP.redis-cluster.default.svc.cluster.local check inter 1s resolvers kube

and when trying using DNS, I got the message:

[WARNING] 045/173656 (19) : Server bk_redis/s0 is DOWN, reason: Layer4 connection problem, info: “Connection refused at step 1 of tcp-check (connect)”, check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 045/173657 (19) : Server bk_redis/s1 is DOWN, reason: Layer4 connection problem, info: “Connection refused at step 1 of tcp-check (connect)”, check duration: 0ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 045/173657 (19) : Server bk_redis/s2 is DOWN, reason: Layer4 connection problem, info: “Connection refused at step 1 of tcp-check (connect)”, check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

Here is the backend config using DNS:
server s0 redis-cluster-0.redis-cluster:6379 resolvers kube check inter 1s resolve-prefer ipv4
server s1 redis-cluster-1.redis-cluster:6379 resolvers kube check inter 1s resolve-prefer ipv4
server s2 redis-cluster-2.redis-cluster:6379 resolvers kube check inter 1s resolve-prefer ipv4

I tried resolving these servers inside the HAProxy container and it does it fine.

Can you please give me a hand with this?

Thanks!

Baptiste · February 16, 2018, 4:27pm

First, there are some typos in the formatting of your message, so we don’t know whether you use server-template directive or regular server lines in HAProxy’s backend.

This is not a DNS resolution problem.
Your DNS resolution works well and HAProxy performed health checks on the IP provided by kubernetes.
Then, the health check failed because the server refused the TCP connection from HAProxy.

So check the networking between HAProxy and the redis pod.

mbaquer6 · February 16, 2018, 7:16pm

Hi Baptiste,
Thanks for your observation, here is the code:

global
    log /dev/log daemon

defaults REDIS
    mode tcp
    timeout connect 3s
    timeout server 6s
    timeout client 6s

resolvers kube
    nameserver dns1 100.64.0.10:53
    resolve_retries 3
    timeout resolve 1s
    timeout retry 1s
    hold valid 10s
    hold obsolete 30s

frontend ft_redis
    bind *:6378 name redis
    default_backend bk_redis

backend bk_redis
    option tcp-check
    tcp-check connect
    tcp-check send PING\r\n
    tcp-check expect string +PONG
    tcp-check send info\ replication\r\n
    tcp-check expect string role:master
    tcp-check send QUIT\r\n
    tcp-check expect string +OK
    server-template redis-cluster- 0-5 _redis-cluster._TCP.redis-cluster.default.svc.cluster.local check inter 1s resolvers kube

In the first example, I’m using server-template but when trying that config, I get a Segmentation Fault.

The second example using DNS, I’m just using the regular server lines. It accepts the config and resolve the IP but won’t connect:

global
    log /dev/log daemon

defaults REDIS
    mode tcp
    timeout connect 3s
    timeout server 6s
    timeout client 6s

resolvers kube
    nameserver dns1 100.64.0.10:53
    resolve_retries 3
    timeout resolve 1s
    timeout retry 1s
    hold valid 10s
    hold obsolete 30s

frontend ft_redis
    bind *:6378 name redis
    default_backend bk_redis

backend bk_redis
    option tcp-check
    tcp-check connect
    tcp-check send PING\r\n
    tcp-check expect string +PONG
    tcp-check send info\ replication\r\n
    tcp-check expect string role:master
    tcp-check send QUIT\r\n
    tcp-check expect string +OK
    server s0 redis-cluster-0.redis-cluster:6379 resolvers kube check inter 1s resolve-prefer ipv4
    server s1 redis-cluster-1.redis-cluster:6379 resolvers kube check inter 1s resolve-prefer ipv4
    server s2 redis-cluster-2.redis-cluster:6379 resolvers kube check inter 1s resolve-prefer ipv4

The Redis servers are listening on port and IP; if I use the IP and port using the regular server lines it connects fine.

Baptiste · February 21, 2018, 6:05pm

Thanks for the clarification!

So, can you confirm that "dig @100.64.0.10 redis-cluster-0.redis-cluster"
ran on the HAProxy box returns an IP address?
Could you run HAProxy in debug mode and return here the output of the start
up (up to 10s)?
run; “haproxy -d -c /patch/to/your/config”

mbaquer6 · February 21, 2018, 7:40pm

Hi again!
Here is the output: It doesn’t give back de IP:

root@redis-haproxy-hkcfg:/# dig @100.64.0.10 redis-cluster-0.redis-cluster

; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> @100.64.0.10 redis-cluster-0.redis-cluster
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 38859
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;redis-cluster-0.redis-cluster.	IN	A

;; AUTHORITY SECTION:
.			60	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2018022101 1800 900 604800 86400

;; Query time: 7 msec
;; SERVER: 100.64.0.10#53(100.64.0.10)
;; WHEN: Wed Feb 21 19:07:00 UTC 2018
;; MSG SIZE  rcvd: 133

But I tried this and it does now:

root@redis-haproxy-hkcfg:/# dig @100.64.0.10 redis-cluster-0.redis-cluster.default.svc.cluster.local

; <<>> DiG 9.9.5-9+deb8u15-Debian <<>> @100.64.0.10 redis-cluster-0.redis-cluster.default.svc.cluster.local
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16200
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;redis-cluster-0.redis-cluster.default.svc.cluster.local. IN A

;; ANSWER SECTION:
redis-cluster-0.redis-cluster.default.svc.cluster.local. 27 IN A 100.96.3.46

;; Query time: 0 msec
;; SERVER: 100.64.0.10#53(100.64.0.10)
;; WHEN: Wed Feb 21 19:09:19 UTC 2018
;; MSG SIZE  rcvd: 100

So I changed the config with:

server s0 redis-cluster-0.redis-cluster.default.svc.cluster.local:6379 resolvers kube check inter 1s resolve-prefer ipv4
server s1 redis-cluster-1.redis-cluster.default.svc.cluster.local:6379 resolvers kube check inter 1s resolve-prefer ipv4
server s2 redis-cluster-2.redis-cluster.default.svc.cluster.local:6379 resolvers kube check inter 1s resolve-prefer ipv4

but got the same message…

root@redis-pod-8976c84dc-lkpfp:/usr/local/etc/haproxy# haproxy -d -c -f /usr/local/etc/haproxy/haproxy.cfg
Configuration file is valid

root@redis-pod-8976c84dc-lkpfp:/usr/local/etc/haproxy# haproxy -d -f /usr/local/etc/haproxy/haproxy.cfg
Note: setting global.maxconn to 2000.
Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result FAILED
Total: 3 (2 usable), will use epoll.

Available filters :
	[SPOE] spoe
	[COMP] compression
	[TRACE] trace
Using epoll() as the polling mechanism.
[WARNING] 051/193405 (576) : Server bk_redis/s0 is DOWN, reason: Layer4 connection problem, info: "Connection refused at step 1 of tcp-check (connect)", check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 051/193405 (576) : Server bk_redis/s1 is DOWN, reason: Layer4 connection problem, info: "Connection refused at step 1 of tcp-check (connect)", check duration: 0ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 051/193406 (576) : Server bk_redis/s2 is DOWN, reason: Layer4 connection problem, info: "Connection refused at step 1 of tcp-check (connect)", check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 051/193406 (576) : backend 'bk_redis' has no server available!

with the server-template:

root@redis-pod-8976c84dc-lkpfp:/usr/local/etc/haproxy# haproxy -d -c -f /usr/local/etc/haproxy/haproxy.cfg
Configuration file is valid
root@redis-pod-8976c84dc-lkpfp:/usr/local/etc/haproxy# haproxy -d -f /usr/local/etc/haproxy/haproxy.cfg
Note: setting global.maxconn to 2000.
Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result FAILED
Total: 3 (2 usable), will use epoll.

Available filters :
	[SPOE] spoe
	[COMP] compression
	[TRACE] trace
Using epoll() as the polling mechanism.
Segmentation fault (core dumped)

The DNS Service Discovery is working:

root@redis-pod-8976c84dc-lkpfp:/usr/local/etc/haproxy# dig -t srv redis-cluster redis-cluster.default.svc.cluster.local

; <<>> DiG 9.10.3-P4-Debian <<>> -t srv redis-cluster redis-cluster.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 4258
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;redis-cluster.			IN	SRV

;; AUTHORITY SECTION:
.			60	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2018022101 1800 900 604800 86400

;; Query time: 42 msec
;; SERVER: 100.64.0.10#53(100.64.0.10)
;; WHEN: Wed Feb 21 19:38:21 UTC 2018
;; MSG SIZE  rcvd: 117

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51775
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 3

;; QUESTION SECTION:
;redis-cluster.default.svc.cluster.local. IN SRV

;; ANSWER SECTION:
redis-cluster.default.svc.cluster.local. 30 IN SRV 10 33 0 redis-cluster-1.redis-cluster.default.svc.cluster.local.
redis-cluster.default.svc.cluster.local. 30 IN SRV 10 33 0 redis-cluster-2.redis-cluster.default.svc.cluster.local.
redis-cluster.default.svc.cluster.local. 30 IN SRV 10 33 0 redis-cluster-0.redis-cluster.default.svc.cluster.local.

;; ADDITIONAL SECTION:
redis-cluster-1.redis-cluster.default.svc.cluster.local. 30 IN A 100.96.2.86
redis-cluster-2.redis-cluster.default.svc.cluster.local. 30 IN A 100.96.2.87
redis-cluster-0.redis-cluster.default.svc.cluster.local. 30 IN A 100.96.3.46

;; Query time: 0 msec
;; SERVER: 100.64.0.10#53(100.64.0.10)
;; WHEN: Wed Feb 21 19:38:21 UTC 2018
;; MSG SIZE  rcvd: 213

I used another pod so I can execute the command that you give me to start HAProxy

Edit:
looking at the output, I don’t see the port from the DNS Service Discovery and that’s maybe why I got that segmentation fault, I’ll look into it, but still doesn’t work on the DNS A registry configuring the port explicitly

Edit: It was me, wrong query, here is with the line of the server template (was OK on the config)

root@redis-pod-8976c84dc-lkpfp:/usr/local/etc/haproxy# dig -t srv redis-cluster _redis-cluster._TCP.redis-cluster.default.svc.cluster.local

; <<>> DiG 9.10.3-P4-Debian <<>> -t srv redis-cluster _redis-cluster._TCP.redis-cluster.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 42364
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;redis-cluster.			IN	SRV

;; AUTHORITY SECTION:
.			60	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2018022101 1800 900 604800 86400

;; Query time: 33 msec
;; SERVER: 100.64.0.10#53(100.64.0.10)
;; WHEN: Wed Feb 21 19:52:39 UTC 2018
;; MSG SIZE  rcvd: 117

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61844
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 3

;; QUESTION SECTION:
;_redis-cluster._TCP.redis-cluster.default.svc.cluster.local. IN	SRV

;; ANSWER SECTION:
_redis-cluster._TCP.redis-cluster.default.svc.cluster.local. 30	IN SRV 10 33 6379 redis-cluster-1.redis-cluster.default.svc.cluster.local.
_redis-cluster._TCP.redis-cluster.default.svc.cluster.local. 30	IN SRV 10 33 6379 redis-cluster-2.redis-cluster.default.svc.cluster.local.
_redis-cluster._TCP.redis-cluster.default.svc.cluster.local. 30	IN SRV 10 33 6379 redis-cluster-0.redis-cluster.default.svc.cluster.local.

;; ADDITIONAL SECTION:
redis-cluster-1.redis-cluster.default.svc.cluster.local. 30 IN A 100.96.2.86
redis-cluster-2.redis-cluster.default.svc.cluster.local. 30 IN A 100.96.2.87
redis-cluster-0.redis-cluster.default.svc.cluster.local. 30 IN A 100.96.3.46

;; Query time: 0 msec
;; SERVER: 100.64.0.10#53(100.64.0.10)
;; WHEN: Wed Feb 21 19:52:39 UTC 2018
;; MSG SIZE  rcvd: 233

gabrielsousa · August 10, 2018, 2:34pm

@mbaquer6 whats your kubernetes version ? its a provate cluster ?

screep · September 11, 2018, 7:29pm

I had the same “Segmentation fault” error with using HAProxy 1.8 and discovery service feature.
I resolved it by removing “tcp-check connect” parameter from the configuration file.

This is my working config:

...
backend bk_redis
    option tcp-check
    # cause of Segmentation fault error
    #tcp-check connect
    tcp-check send PING\r\n
    tcp-check expect string +PONG
    tcp-check send info\ replication\r\n
    tcp-check expect string role:master
    tcp-check send QUIT\r\n
    tcp-check expect string +OK
    server-template srv 5 _peer._tcp.redis.default.svc.cluster.local resolvers kubedns resolve-prefer ipv4 check inter 5s
...

rockandska · April 4, 2019, 9:54am

Please see Layer4 connection problem, info: “Connection refused at step 1 of tcp-check (connect)”

I assume you have hit the same bug as me.

Regards,

lukastribus · April 4, 2019, 5:58pm

A more thorough description of the bug can be found here (including workarounds and affected/fixed releases):

Topic		Replies	Views
Haproxy 1.8.13 & kubernetes service Discovery Help!	20	2060	August 10, 2018
Layer4 connection problem, info: “Connection refused at step 1 of tcp-check (connect)” Help!	11	18101	April 4, 2019
HAProxy fails to start if backend server names don't resolve Help!	27	49340	November 2, 2017
Trouble with DNS Resolvers sticking to single IP address Help!	30	24436	May 19, 2022
Resurrecting backend servers with health checks Help!	1	256	January 12, 2023

HAProxy won't connect resolving with Kubernetes DNS HAProxy 1.8.4

Related topics