The problem:
when I reload haproxy, my backend server goes DOWN each time because of DNS timeout, and the output logging is:
[WARNING] 014/190450 (22) : Reexecuting Master process
[WARNING] 014/191701 (22) : parsing [/usr/local/etc/haproxy/haproxy.cfg:51]: 'log-format' overrides previous 'option httplog' in 'defaults' section.
[WARNING] 014/191701 (22) : Setting tune.ssl.default-dh-param to 1024 by default, if your workload permits it you should set it to at least 2048. Please set a value >= 1024 to make this warning disappear.
[WARNING] 014/191701 (22) : [haproxy.main()] Cannot raise FD limit to 2097186, limit is 1048576.
Jan 15 19:17:01 localhost haproxy[22]: Proxy name_resolver_http started.
Jan 15 19:17:01 localhost haproxy[22]: Proxy nginx_nginx-80-servers started.
Jan 15 19:17:01 localhost haproxy[47]: Stopping proxy stats in 0 ms.
Jan 15 19:17:01 localhost haproxy[47]: Stopping frontend name_resolver_http in 0 ms.
Jan 15 19:17:01 localhost haproxy[47]: Stopping backend nginx_nginx-80-servers in 0 ms.
Jan 15 19:17:01 localhost haproxy[47]: Proxy stats stopped (FE: 1676 conns, BE: 10 conns).
Jan 15 19:17:01 localhost haproxy[47]: Proxy name_resolver_http stopped (FE: 1707 conns, BE: 0 conns).
Jan 15 19:17:01 localhost haproxy[47]: Proxy nginx_nginx-80-servers stopped (FE: 0 conns, BE: 3 conns).
[WARNING] 014/191701 (22) : [haproxy.main()] FD limit (1048576) too low for maxconn=1048576/maxsock=2097186. Please raise 'ulimit-n' to 2097186 or more to avoid any trouble.
Jan 15 19:17:01 localhost haproxy[58]: Health check for server nginx_nginx-80-servers/nginx_nginx_9151bcdcd9d17452534689968a4ca067b3da3164a171de7731369a5862c9d646_80 succeeded, reason: Layer4 check passed, check duration: 0ms, status: 3/3 UP.
[WARNING] 014/191701 (22) : Former worker 47 exited with code 0
Jan 15 19:17:12 localhost haproxy[58]: Server nginx_nginx-80-servers/nginx_nginx_9151bcdcd9d17452534689968a4ca067b3da3164a171de7731369a5862c9d646_80 is going DOWN for maintenance (DNS timeout status). 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jan 15 19:17:12 localhost haproxy[58]: backend nginx_nginx-80-servers has no server available!
My haproxy.conf:
global
maxconn 1048576
stats socket /var/run/haproxy-admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
pidfile /var/run/haproxy.pid
log 127.0.0.1 local0
max-spread-checks 60s
master-worker no-exit-on-failure
nbthread 2
resolvers mydns
nameserver dns1 127.0.0.11:53
resolve_retries 3
timeout retry 1s
hold other 10s
hold refused 10s
hold nx 10s
hold timeout 10s
hold valid 10s
defaults
mode http
maxconn 1048576
balance roundrobin
timeout connect 5000ms
timeout client 65000ms
timeout server 65000ms
timeout tunnel 3600s
timeout check 5s
option httplog
option dontlognull
option http-server-close
option abortonclose
option log-health-checks
log global
log-format %ci:%cp\ [%t]\ %Tr\ %s\ %ST\ %B\ %hr\ %hs\ %H\ %{+Q}r
# If sending a request to one server fails, try to send it to another, 3 times
# before aborting the request
retries 3
#http-reuse safe
option forwardfor
# Do not enforce session affinity (i.e., an HTTP session can be served by
# any Mongrel, not just the one that started the session
option redispatch
no option checkcache
option accept-invalid-http-response
option accept-invalid-http-request
default-server init-addr last,libc,none
frontend name_resolver_http
bind *:80
errorfile 503 /usr/local/etc/haproxy/errors/503.http
capture request header Host len 80
monitor-uri /haproxy-monitor
acl is_websocket hdr(Upgrade) -i WebSocket
acl is_appone.example.com hdr_reg(host) -i ^appone.example.com(:[0-9]+)?$
acl is_appone.example.com_port hdr(host) -i appone.example.com:80
use_backend nginx_nginx-80-servers if is_appone.example.com or is_appone.example.com_port
backend nginx_nginx-80-servers
server nginx_nginx_9151bcdcd9d17452534689968a4ca067b3da3164a171de7731369a5862c9d646_80 nginx_nginx.1.uh0hgakyv1pe5pigo2xrligm2:80 cookie 9151bcdcd9d17452534689968a4ca067b3da3164a171de7731369a5862c9d646 weight 100 check resolvers mydns
At the same time, I have captured the packets used by tcpdump:
19:17:01.297238 Out 02:42:ac:11:00:04 ethertype IPv4 (0x0800), length 101: (tos 0x0, ttl 64, id 21822, offset 0, flags [DF], proto UDP (17), length 85)
172.17.0.4.56594 > 100.100.2.138.53: [bad udp cksum 0x1356 -> 0x967e!] 44772+ A? nginx_nginx.1.uh0hgakyv1pe5pigo2xrligm2. (57)
19:17:01.307256 In 02:42:39:3e:c5:dd ethertype IPv4 (0x0800), length 176: (tos 0x0, ttl 63, id 23130, offset 0, flags [none], proto UDP (17), length 160)
100.100.2.138.53 > 172.17.0.4.56594: [udp sum ok] 44772 NXDomain q: A? nginx_nginx.1.uh0hgakyv1pe5pigo2xrligm2. 0/1/0 ns: . [3h] SOA a.root-servers.net. nstld.verisign-grs.com. 2018011500 1800 900 604800 86400 (132)
19:17:01.307382 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 176: (tos 0x0, ttl 64, id 9331, offset 0, flags [DF], proto UDP (17), length 160)
127.0.0.11.53 > 127.0.0.1.50611: [bad udp cksum 0xfea9 -> 0xc9b2!] 44772 NXDomain q: A? nginx_nginx.1.uh0hgakyv1pe5pigo2xrligm2. 0/1/0 ns: . [3h] SOA a.root-servers.net. nstld.verisign-grs.com. 2018011500 1800 900 604800 86400 (132)
19:17:01.417324 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 156: (tos 0x0, ttl 64, id 9348, offset 0, flags [DF], proto UDP (17), length 140)
127.0.0.11.53 > 127.0.0.1.55220: [bad udp cksum 0xfe95 -> 0x49ce!] 51922 q: A? nginx_nginx.1.uh0hgakyv1pe5pigo2xrligm2. 1/0/0 nginx_nginx.1.uh0hgakyv1pe5pigo2xrligm2. [10m] A 10.254.0.5 (112)
19:17:01.417499 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 156: (tos 0x0, ttl 64, id 9349, offset 0, flags [DF], proto UDP (17), length 140)
127.0.0.11.53 > 127.0.0.1.53913: [bad udp cksum 0xfe95 -> 0x7275!] 42822 q: A? nginx_nginx.1.uh0hgakyv1pe5pigo2xrligm2. 1/0/0 nginx_nginx.1.uh0hgakyv1pe5pigo2xrligm2. [10m] A 10.254.0.5 (112)
19:17:01.447001 In 00:00:00:00:00:00 ethertype IPv4 (0x0800), length 101: (tos 0x0, ttl 64, id 9353, offset 0, flags [DF], proto UDP (17), length 85)
127.0.0.11.53 > 127.0.0.1.60407: [bad udp cksum 0xfe5e -> 0xf1d5!] 48670 q: AAAA? nginx_nginx.1.uh0hgakyv1pe5pigo2xrligm2. 0/0/0 (57)
We can see the successful DNS response at 19:17:01, but the haproxy process set my nginx backend server DOWN at 19:17:12.
What happened? And the problem is never found in haproxy 1.8.2