Using SRV Record - multiple entries per IP gets "No IP for server"

I am running a node.js cluster on a host. Multiple processes start, each listening on a different port. I would like to take advantage of dynamic configuration using SRV records but I seem to have HAProxy discovering only one backend per IP address. Is having multiple backends on an IP address a supported configuration?

Sample DNS Query

;; QUESTION SECTION:
;_http._tcp.mydomain.example.com. IN SRV

;; ANSWER SECTION:
_http._tcp.mydomain.example.com. 5 IN SRV 1 10 8443 ip-10-0-32-101.ec2.internal.
_http._tcp.mydomain.example.com. 5 IN SRV 1 10 8444 ip-10-0-32-101.ec2.internal.
_http._tcp.mydomain.example.com. 5 IN SRV 1 10 8445 ip-10-0-32-101.ec2.internal.
_http._tcp.mydomain.example.com. 5 IN SRV 1 10 8443 ip-10-0-44-111.ec2.internal.
_http._tcp.mydomain.example.com. 5 IN SRV 1 10 8444 ip-10-0-44-111.ec2.internal.
_http._tcp.mydomain.example.com. 5 IN SRV 1 10 8445 ip-10-0-44-111.ec2.internal.

;; Query time: 3 msec
;; SERVER: 10.0.0.2#53(10.0.0.2)
;; WHEN: Wed Jan 31 00:06:14 2018
;; MSG SIZE rcvd: 193

HAPROXY VERBOSE

~/haproxy-ss-20180126% ./haproxy -vv
HA-Proxy version 1.8.3-2dd90ea 2018/01/25
Copyright 2000-2017 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux26
CPU = generic
CC = gcc
CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label
OPTIONS = USE_ZLIB=1 USE_OPENSSL=yes USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with PCRE version : 8.21 2011-12-12
Running on PCRE version : 8.21 2011-12-12
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with network namespace support.

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

CONFIG (some items redacted with *******)

global

Traffic logs

log /dev/log local0

Event logs:

log /dev/log local1 notice
maxconn 32768
chroot /var/lib/haproxy
stats socket /var/run/haproxy/admin.sock mode 660 level admin
server-state-file /tmp/server_state
stats timeout 30s
user haproxy
group haproxy
tune.ssl.default-dh-param 2048
daemon

For more information, see ciphers(1SSL).

ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:AES256-GCM-SHA384:AES256-SHA256:AES256-SHA:DES-CBC3-SHA:!LOW:!3DES:!MD5:!EXP:!PSK:!aNULL:!eNULL

defaults
mode http
log global
timeout connect 60s
timeout client 360s
timeout server 360s
timeout tunnel 3600s
backlog 32768
balance leastconn

listen stats
bind 0.0.0.0:9090
mode http
stats enable
stats hide-version
stats realm Haproxy\ Statistics
stats uri /
stats auth *:
stats refresh 5s

peers peers_section
peer *************** ***********

frontend anon-fe
option http-no-delay
bind :443 ssl crt **************** no-sslv3
mode http
maxconn 32768
option httplog
option dontlognull
tcp-request inspect-delay 5s
use_backend anon_backend
default_backend anon_backend

frontend my_healthcheck
bind :8443 ssl crt ************ no-sslv3
mode http
option httplog
option forwardfor
acl dead nbsrv(anon_backend) lt 1
monitor-uri **********
monitor fail if dead
default_backend anon_backend

backend anon_backend
option http-no-delay
option httpchk POST *********
http-check disable-on-404
http-check expect status 200
timeout check 10s
balance url_param *********
hash-type consistent
server-template back-rr 4 _http._tcp.mydomain.example.com resolvers localresolv resolve-prefer ipv4 check >inter 1s downinter 60s fall 3 weight 1

resolvers localresolv
nameserver dns1 10.0.0.2:53
resolve_retries 3
timeout resolve 1s
timeout retry 1s
hold other 30s
hold refused 30s
hold nx 30s
hold timeout 30s
hold valid 10s
hold obsolete 30s
accepted_payload_size 8192

LOGS

[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.
[WARNING] 030/010618 (27843) : parsing [/etc/haproxy/haproxy.cfg:71] : a ‘tcp-request’ rule placed after an ‘http-request’ rule will still be processed before.
[WARNING] 030/010618 (27843) : parsing [/etc/haproxy/haproxy.cfg:72] : a ‘tcp-request’ rule placed after an ‘http-request’ rule will still be processed before.
[WARNING] 030/010618 (27843) : parsing [/etc/haproxy/haproxy.cfg:73] : a ‘tcp-request’ rule placed after an ‘http-request’ rule will still be processed before.
[WARNING] 030/010618 (27843) : parsing [/etc/haproxy/haproxy.cfg:75] : a ‘tcp-request’ rule placed after an ‘http-request’ rule will still be processed before.
[WARNING] 030/010618 (27843) : parsing [/etc/haproxy/haproxy.cfg:76] : a ‘tcp-request’ rule placed after an ‘http-request’ rule will still be processed before.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result FAILED
Total: 3 (2 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace
Using epoll() as the polling mechanism.
[WARNING] 029/235609 (13803) : Server anon_backend/back-rr1 is DOWN, reason: Socket error, check duration: 0ms. 3 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 029/235609 (13803) : anon_backend/back-rr1 changed its IP from to 10.0.32.101 by localresolv/dns1.
[WARNING] 029/235609 (13803) : Server anon_backend/back-rr2 is going DOWN for maintenance (No IP for server ). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 029/235609 (13803) : Server anon_backend/back-rr3 is going DOWN for maintenance (No IP for server ). 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

It’s possible we only look at the IP, therefor only permitting one server per IP address. @Baptiste ?

Assuming only one server per IP address is currently permitted, is this expected to be the case permanently? Trying to validate whether or not my use case is valid (multiple backends on the same IP). Thanks.

I second that use case.

Hi Lukas,

You’re absolutely right!
We implemented a “deducplication” detection…
That said, this use case sound reasonable but I think the implementation
won’t be tricky…

I just want to point out that the use case is super common if you use a cluster scheduler like nomad: once you have more than two instances of a job, it can happen that two of them run on the same host and jobs on one host share one IP (which I understand is different from k8n that has one IP per pod). Guess how I found out. :smiley:

Hi. We’re also affected by this after an upgrade from 1.5 to 1.8 and using the Resolvers functionality. I’m really glad this was posted else I would’ve been debugging for ages haha.

In terms of the fix is there somewhere where this is being tracked? Or where would be the best place for me to keep an eye out on the fix being rolled out? Until it’s in we’ve had to switch to 1.7 in the meantime, which works fully.

Hi @Baptiste. Apologies for tagging you but I was wondering if there was any further progress made on this or if there was a better place for me to check for updates?

We’re currently blocked from upgrading to 1.8 due to this. Although we’re using standard hostname resolution and not SRVs, it’s the same effect of all but one of the backend servers being down for maint due to this deduplication feature.

Any help would be awesome. Cheers!

2 Likes

sadly this is still an issue in 1.8.15, the DSN SRV support is essentially broken if you are using containers as it’s very common to run more than one container per host :frowning:

@hampsterx This was implemented in 1.8.14 and you can enable the behavior with resolve-opts allow-dup-ip, also see resolve-opts docs.

2 Likes