Backend Server Timeouts

Hello

We use haproxy together with keepalived as an high available loadbalancer

The current versions are:
Linux: Ubuntu 16.04 LTS
haproxy: 1.6.9
keepalived: 1.2.19

We are using haproxy since summer of last year to deploy a http-site to customers. At the end of 2016, the problems connecting to the backend application began and the users are experiencing timeouts.

There are 4 backend servers and haproxy is deploying the users on one of the 4 servers depending on a cookie set by the application.

We’re not sure if the problems are coming from haproxy or the backend application servers. Our problem is, that we can’t reproduce the problem the users are reporting. And there is actually no possibility to let some users connect directly to a backend server without haproxy (vpn limitation).

So i need to be sure that the problem is coming from backend application server.

I’ve now activated deeper logging in haproxy config:

  • option httplog
  • option log-separate-errors
  • option dontlog-normal

now, i have a lot of entries like the following in the logs:
May 16 15:31:29 xxxx haproxy[932]: xx.xx.xx.xx:xxxx [16/May/2017:15:29:21.324] server_80 server_80/server4 7/0/128231/1/128239 200 499 - - --VN 1670/906/485/198/1 0/0 “GET /di… HTTP/1.1”

i am not sure, but the time shown is very strange. It seems, that the backend web server is not giving a response in an acceptable amount of time.

Is this true? Can the problem be isolated at the backend webserver with those log entries?
Or is there an other possibility for our problem?

According to this log, haproxy takes 128 seconds to actually connect to the backend. I suggest you check iptables/conntrack/networking between haproxy and the backend server.

Thanks for your reply.

there is a lot of communication between haproxy and the backend servers and only some of them are experiencing those time values. Those entries are under 5% of the whole communication, but still more than 1000 a day.

i’m not sure if this is the problem the users are experiencing or if there is another problem…

But i think, the network between haproxy and the backend servers isn’t the problem, because it is very flat (just one 10GBit-switch, no HW-FW,…)

I understand that, but there are still a lot of things to check. Most importantly dmesg output of all boxes, for any conntrack related issues, etc.

The backend webservers are windows machines and the web-application running on that boxes are “blackboxes” for me. So i have to prove that the loadbalancer isn’t the problem.

With dmesg on the loadbalancer i don’t see much special

conntrack is new for me, i don’t know much about this tool and how to run it to get the infos i need. Any suggestion?

Thanks

So mention of conntrack in the dmesg output?

If you need to proove that this is the application, I guess there is not much left to do other than permanently sniffing the backend traffic on the haproxy box (maybe use dumpcap with circular buffers), and analyze what happens when you have a reoccurrence of the problem.