Hello,
I encountered an issue where haproxy fails whenever I scale up more replicas of the backend service. When the number goes beyond 27 by a single instance, haproxy goes in maintenance as down due to an unspecified DNS error.
After a lot of investigation and config tweaks, I have narrowed the problem down to the customdns resolver that I am using. However, I can clearly see that my custom resolver is sending back the 28 IPs for the domain requested by haproxy (check tcpdump). Thus, the issue seems to be with how haproxy is receiving this response and use it.
Once I scale down one replica of the backend service to 27, it works. The issue really seems to be with the number of IPs returned by the DNS.,
Could someone help me out in this investigation and guide me? If this is really a bug in haproxy itself, what’s the process for opening this as an issue?
P.S: I included below all the useful logs/configuration + some of my investigation that clearly shows my custom DNS is working properly. (Hence, when it’s 27 IPs haproxy works)
haproxy logs:
2023-03-16T14:45:01-04:00 [WARNING] (204) : Server legacy_http_feature/legacy_http_endpoint_1 is going DOWN for maintenance (unspecified DNS error). 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-03-16T14:45:01-04:00 [WARNING] (204) : Server legacy_http_feature/legacy_http_endpoint_2 is going DOWN for maintenance (unspecified DNS error). 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-03-16T14:45:01-04:00 [WARNING] (204) : Server legacy_http_feature/legacy_http_endpoint_3 is going DOWN for maintenance (unspecified DNS error). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2023-03-16T14:45:01-04:00 [ALERT] (204) : proxy 'legacy_http_feature' has no server available!
base.cfg:
global
log stdout format raw local0 debug
user haproxy
group haproxy
nbthread 6
resolvers customdns
nameserver local ${RESOLVER_IP}:53
hold valid 3s # Time to cache valid entries
hold nx 30s # Time to cache entries which were valid and changed to NX
hold refused 30s # Time to cache entries which were valid when DNS queries gets a REFUSED error
hold timeout 2m # Time to cache entries which were valid when DNS queries times-out
hold obsolete 0s # Time to cache entries which were valid and disappeared from the DNS answer
hold other 10s # Time to cache entries which were valid in case of any other type of DNS error
timeout resolve 1s # Interval at which the DNS server is queried
timeout retry 1s # Interval at which retry querying the DNS if a query did not work
resolve_retries 3 # Number of retries before waiting for "timeout resolve" again
accepted_payload_size 65535 # Accepted payload size, 8192 being the maximum value
defaults
log global
option dontlognull
option tcplog
option logasap
timeout connect 5s
timeout client 310s
timeout server 310s
# Allows haproxy to start when some/all DNS entry do not resolve
default-server init-addr last,libc,none
balance source
hash-type consistent
hash-balance-factor 150
haproxy.cfg:
listen legacy_http_feature
log global
mode tcp
bind "${EXTERNAL_IP}:81"
option log-health-checks
option tcp-check
server-template legacy_http_endpoint 3 nginx.env-branch.test-dns-nginx.custom:70 check resolvers customdns send-proxy maxconn 28000
looking up the domain using dig:
$ dig nginx.env-branch.test-dns-nginx.custom
; <<>> DiG 9.16.1-Ubuntu <<>> nginx.env-branch.test-dns-nginx.custom
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40935
;; flags: qr rd ra; QUERY: 1, ANSWER: 28, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;nginx.env-branch.test-dns-nginx.custom. IN A
;; ANSWER SECTION:
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.253.108
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.255.175
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.249.139
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.249.101
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.248.97
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.248.225
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.251.54
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.251.78
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.253.171
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.250.15
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.249.163
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.248.3
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.251.202
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.250.154
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.248.199
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.254.121
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.251.138
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.251.179
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.253.240
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.249.128
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.251.204
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.255.29
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.248.35
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.254.142
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.253.192
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.251.190
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.254.158
nginx.env-branch.test-dns-nginx.custom. 5 IN A 10.86.255.194
;; Query time: 4 msec
;; SERVER: 10.86.55.50#53(10.86.55.50)
;; WHEN: Thu Mar 16 20:33:26 UTC 2023
;; MSG SIZE rcvd: 531
Capturing communication between haproxy and custom dns resolver:
$ tcpdump -vvvNNi any src 10.86.252.179 and port 53 or dst 10.86.252.179 | grep test-dns-nginx
10.86.252.179.35869 > 10.86.55.50.domain: [bad udp cksum 0x11ff -> 0x6563!] 7674+ [1au] A? nginx.env-branch.test-dns-nginx.custom. ar: . OPT UDPsize=65535 (83)
10.86.55.50.domain > 10.86.252.179.35869: [udp sum ok] 7674*| q: A? nginx.env-branch.test-dns-nginx.custom. 27/0/1 nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.251.54, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.251.204, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.251.78, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.255.194, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.249.163, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.249.101, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.254.142, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.253.171, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.253.240, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.253.108, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.254.158, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.248.225, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.250.154, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.255.29, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.251.138, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.248.35, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.249.128, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.248.199, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.251.179, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.251.190, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.254.121, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.248.3, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.248.97, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.253.192, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.251.202, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.249.139, nginx.env-branch.test-dns-nginx.custom. [5s] A 10.86.250.15 ar: . OPT UDPsize=65535 (515)
10.86.252.179.35869 > 10.86.55.50.domain: [bad udp cksum 0x11ff -> 0x6548!] 7674+ [1au] AAAA? nginx.env-branch.test-dns-nginx.custom. ar: . OPT UDPsize=65535 (83)
10.86.55.50.domain > 10.86.252.179.35869: [udp sum ok] 7674*| q: AAAA? nginx.env-branch.test-dns-nginx.custom. 15/0/1 nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.255.175, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.255.194, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.250.154, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.251.204, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.249.139, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.249.128, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.248.97, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.251.202, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.254.158, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.249.163, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.253.192, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.253.240, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.255.29, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.249.101, nginx.env-branch.test-dns-nginx.custom. [5s] AAAA ::ffff:10.86.253.108 ar: . OPT UDPsize=65535 (1313)