I just started logging all haproxy web requests to elasticsearch and have been running some analysis on the long running requests. I came across a high number of requests that are being retried - meaning they hit the “timeout connect 10s” limit and were retrieved successfully.
Our HAProxy box runs on a 1gb/s nic and has at most 30mb/s coming out of it. We have about 400 r/s coming into haproxy and split pretty evenly to our 4 webservers. Looking at the stats we have .8/s retry rate. That seems awfully high to me… I understand that number will never be zero but almost 1 request a second is having to retry.
I’ve been monitoring/logging tons of networking metrics on our webserver side and they seems fine. No queueing on their side, normal cpu (<30%), and around 30mb/s also on a 1gb/s nic.
I’ve set the timeout lower to “3100” and the rate stayed the same.
My main question is what are the normal causes of the high retry rate? Nic being saturated? Our servers sit right next to each other so latency shouldn’t be an issue.
The only other thing I have thought of is we use websockets with haproxy and have a high number of connections (about 6000) at this writing.
Any help or direction in digging into this would be appreciated!