Normal average response time, really high average total_time according to prometheus metrics

bandan · August 7, 2024, 10:03pm

Hello,
we have HAproxy set up with ~200 backend servers, and I noticed that the haproxy_server_response_time_average_seconds is acceptable (around 400ms) but the metrics for haproxy_server_total_time_average_seconds is almost around 2 minutes for each server, and would like to debug the issue. I would like to ask, what is the fundamental difference between response_time and total_time? Since I don’t know their components I don’t know why the difference is this big. Also would appreciate any ideas for debugging the issue. Thanks

adarragon · August 19, 2024, 8:06am

As per the Prometheus field description, haproxy_server_response_time_average_seconds is average response time for last 1024 successful connections (only the response time), whereas haproxy_server_total_time_average_seconds is average total time for last 1024 successful connections, which is the total duration of the stream (queue, connect and response time all cumulated plus additional time not necessarily tracked by subcounters).

If the total time is higher than expected, then it may be interesting to check available individual timers to see where the time is spent. (queue time, connect time)

Also, response time doesn’t count for the complete duration of the response, it reflects the time spent waiting for the first response byte from the server. Thus if the server sends a large amount of data, it would be expected that the total time is a lot bigger than the response time because most time could actually be spent between the first byte from the response and the actual end of the response.

Topic		Replies	Views
Total Time stat seems high on stats page Help!	4	1972	March 3, 2022
HTTP logs report very high processing time Help!	4	2862	September 5, 2023
Backend Server Timeouts Help!	5	948	May 17, 2017
After HAproxy is implemented, application time got increased Help!	5	1704	October 29, 2019
About metrics resolution Help!	13	2528	August 5, 2019

Normal average response time, really high average total_time according to prometheus metrics

Related topics