I have a loadbalancer (two via keepalived/VIP for failover) running HAProxy 2.0.13 on Ubuntu Server 20.04. It’s running TCP mode to proxy API requests back to a round-robin of Windows/IIS ASP.NET servers that are terminating TLS.
Anyway, the vendor consuming the API is reporting a large volume of TCP connect timeouts. I realize this could be on the Windows/IIS end, but I was hoping to use haproxy metrics to try to narrow it down. I’ve seen a couple of things:
- Current session number is quite high relative to the session rate. I mean that for any given 10s time period reported, the new session rate never exceeds about 15 and is usually around 10, but my current session count is usually around 200 with spikes over 600. I have stats on the IIS side and 99.99% of requests are taking <4s so I wouldn’t think they’d build up that high? Could this be chalked up to slow response times to requests on the back-end workers, or am I interpreting this metric wrong?
- The HTTP stat page shows ‘1’ error for one backend worker under the “Errors->Resp” column. When I hover over this, I see “Connection resets during transfers” values: 128k for client and 12 for server. There are only a total of 173k sessions during this period. Is it normal to have such a high connection reset for client stat, or is that a smoking gun?
- Max concurrent sessions doesn’t exceed 1000, is that high for one haproxy server?
Another thing I noticed is that on the backend workers, I basically see a repeating process where they build up 100+ simultaneous open connections in a row and they all sit there for a good period of time before closing one after the other. I’m trying to figure out if this is down to how the API is being called, or if it’s a weird consequence of using haproxy in tcp mode…