Hello HAProxy Team!
we already reported this right after the release of 1.9 and my former colleague Ashwin N. had a longer communication with Willy about that.
We observed significantly increased latency introduced by HAProxy 1.9 once the requests-per-second (RPS) increases beyond 40-50RPS. After the release of 2.0 we were hoping that the issue may have been resolved, but unfortunately, it is still there.
Here is our observation:
- we upgraded from 1.8.17 to 2.0.2
- we did not change the config file
- we started 2.0.2 using master/worker mode (-Ws and -S command line parameter)
- we have typically between 8 and 10 backends but some backends can have 1000+ servers
- we use HAProxy both as ingress as well as egress for our services, i.e. requests to the service will hit HAProxy on an HTTPS endpoint which will then connect to our service via localhost:8080. Requests to downstream services will be made to service.localhost:80 which HAProxy will then resolve via a host acl to the corresponding backend. Connections to backend servers are using HTTPS.
- we are using roundrobin load balancing scheme
In our environment, we started observing significantly increased p99 latency with 1.9 and 2.0.2. Our service “normally” reports a p99 response time to a downstream service with 600ms. After the upgrade to HAProxy, this p99 latency went up to 1200ms (yes, 1.2 seconds!!). So, it appears that HAProxy 2.0.2 introduces extra latency of up to 600ms! The “average” processing time is measured as 55ms and with HAProxy 2.0.2 it went up to 68ms. So the average latency increase was 13ms!
Also, important to mention: we run about 80 instances of our service in parallel. To test HAProxy 2.0.2, we have deployed the new version to two instances and ran 2.0.2 and 1.8.17 in parallel so we could directly compare the metrics coming from both versions.
This latency increase cannot be seen in low RPS environments below about 40-50RPS and 1.8 and 1.9/2.0 seem to behave exactly the same.
This is a HUGE blocker for us to move from our current HAProxy 1.8.x to the latest since we need to take advantage of the newly introduced connection pooling to backend servers.
Again, Willy should already have enough details from our previous interaction. He mentioned he wanted to look into that and hopefully find a fix but we never heard back…
Any help would be greatly appreciated!
P.S. I will be out of the country for one week starting tomorrow without internet connection. So, I will be unable to respond to questions before that.