HaProxy 2.2.9: Resolvers setting status to maintenance

whiiiskyy · February 25, 2021, 5:00am

Hi guys,

Aiming to solve the issue where Haproxy would only resolve the DNS during the startup instead of “on the run”, I created a new Google Cloud VM running HaProxy 2.2.9.

I did set the resolve block:

resolvers pc-dns
    nameserver pcdns     "Google Internal DNS Resolver IP"
    resolve_retries    30
    timeout retry	1s
    hold valid         5s

Then add at the of each backend server:
....... check port 80 inter 1s fall 1 rise 3 resolvers pc-dns

I have tried both:

resolvers  pc-dns
  nameserver dns1 "Google Internal DNS Resolver IP":53
  hold valid           5s

and

resolvers pc-dns
  parse-resolv-conf
  hold valid           10s

But I ended up with the same result. After 40s or so, the backend servers are set to MAINT.
But the servers are up and running

I appreciate any suggestion.

whiiiskyy · March 1, 2021, 12:32am

Hey @lukastribus sorry for tagging you but I am hoping you could point me in the right direction.

This is the resolvers section. Google Cloud VMs have an internal nameserver under their resolv.conf so why I am not explicating it here:

resolvers pcdns
  parse-resolv-conf
  resolve_retries       3
  timeout resolve       1s
  timeout retry         1s
  hold other           30s
  hold refused         30s
  hold nx              30s
  hold timeout         30s
  hold valid           10s
  hold obsolete        30s

The backend looks like:

backend....
  mode http
  option httpchk
  http-check send meth GET uri /session/status 
  http-check expect string success
  server ............ weight 10 check port 80 inter 1s fall 1 rise 3 resolvers pcdns

After a systemctl reload haproxy, the static page shows everything up for a few seconds before setting everything to MAINT and resolution under the check column.

I am trying to solve this with HaProxy. Google LB sucks big time and I am kinda the one blocking that from happening. HaProxy is way better for our env.

Thanks a lot if you somehow manage to see this.

baptiste64 · March 1, 2021, 6:22pm

Hi,
Not Lukas ( ), but a few hints:

Can you try to put the resolver ip adress instead of parsing the resolv section ? (should be 169.254.169.254 ?)
Can you provide a network capture on port 53 while the reload/restart occurs, to see if a dns request is effectively sent ?

lukastribus · March 1, 2021, 8:45pm

Server will go into MAINT on DNS resolution failures, so yes, this just means you need basic DNS troubleshooting.

To disable initial libc resolution, and therefor have a clearer failure picture in this situation (it won’t fail after some time, but immediatly), use the init-addr last server option.

whiiiskyy · March 1, 2021, 10:58pm

@lukastribus @baptiste64 thanks a lot for the help.
I have made the changes as requested

The option below gives me the error: no method found to resolve address
I saw another post where you can use none instead of last. The error message is gone, but the problem persists.

defaults
  default-server          init-addr last

I have changed from/to

#from
resolvers pcdns
parse-resolv-conf

#to
resolvers pcdns
# Added the line below
  nameserver dns1 	169.254.169.254:53

@baptiste64 sorry for the noobie question but first I don’t entirely know how to do that DNS capture (Google should help me out)
Also, does that means the HAProxy is receiving on 53??? If that is true, the problem then is the firewall.
HaProxy has only HTTP/HTTPS and 8161 port open to the network.

Today is my deadline, my manager is pushing Google LB but I am being stubborn and need to find a solution by COB lol

Thanks guys

lukastribus · March 2, 2021, 2:07pm

It means taking a look at the DNS traffic. Try:

tcpdump -vvvs0 port 53

whiiiskyy · March 2, 2021, 11:35pm

Thanks @lukastribus ,

Hmmm, that is weird. It seems like the issue is Google Cloud, again.

My HaProxy VM resolv.conf looks like:

search MY_CLOUD_PROJECT_ID_HERE.internal google.internal
nameserver 169.254.169.254

And tcpdump shows a lot of:


HAPROXY_VM_HOSTNAME.internal.49972 > 169.254.169.254: [bad udp cksum 0x6879 -> 0x1f18!] ...BACKEND_SERVER... to be continued
169.254.169.254 > HAPROXY_VM_HOSTNAME.internal.54778: [udp sum ok]  ...BACKEND_SERVER... ...to be continued...

I don’t think those “bad udp cksum” were supposed to be there. Especially because it is the VM calling the internal DNS resolver.

I tried that tcpdump on my desktop and I got some bad udp cksum.
In my case tho I do use Pi-Hole and firewall rules to filter DNS requests so I cannot use it as an example.

But please, share what you think as I could be 100% wrong

Thank you

lukastribus · March 3, 2021, 1:01am

Thats segmentation offload and completely normal. What we need to look at is everthing you cut away. If you dont want to share that, send it via PM.

whiiiskyy · March 3, 2021, 3:02am

@lukastribus I am so sorry about that. Just trust issues when sharing even inoffensive company data online.
I am sending you the picture below where:

tobedel25: HaProxy 2.2.9
session01 are the 2 backends

This is a controlled env so I can make any change as needed.

Thank you so much.

lukastribus · March 3, 2021, 9:30am

Ok, so from the outputs you can see that the DNS server responds with NXDomain, meaning that the DNS server did not find any records for the hostname you are querying.

I think that the problem here could be the missing search path. It is in resolv.conf, but haproxy only parses nameserver out of it, not the search path.

Can you try changing:

session01 to session01.MY_CLOUD_PROJECT_ID_HERE.internal google.internal and
session01-dev to session01-dev.MY_CLOUD_PROJECT_ID_HERE.internal google.internal

whiiiskyy · March 4, 2021, 3:08am

@lukastribus you making miracles as usual

I made the changes and voila, it worked.
tcpdump is returning A/AAAA and no more NDXDomain
HaProxy is no longer setting backends to MAINT.

It doesn’t make sense tho. /etc/hosts has both session01 and session01.MY_CLOUD_PROJECT_ID_HERE.internal google.internal “layout” into all the VMs. I might need to contact the terrible Google Cloud support again.

Thank you so so much for the help

whiiiskyy · March 4, 2021, 3:45am

FYI

I just deleted one of the backends, built a new VM, released microservice to it (Jenkins) and voila. HaProxy started talking to it after the VM passed the health test without reloading HaProxy.

Thanks a lot

lukastribus · March 4, 2021, 6:59am

Remote Google DNS servers don’t know and don’t care what you have in /etc/hosts. And the haproxy resolver doesn’t either, only the libc resolver does.

Topic		Replies	Views
DNS wont continue to resolve? Help!	2	1489	April 17, 2020
Is it possible to only have a custom resolver resolve DNS at startup and not again until a restart? Help!	0	220	January 27, 2023
Resolvers inside kubernetes Help!	11	5385	July 7, 2017
Issues proxying in K8s Help!	4	1474	June 25, 2019
Trouble with DNS Resolvers sticking to single IP address Help!	30	24321	May 19, 2022

HaProxy 2.2.9: Resolvers setting status to maintenance

Related topics