Understanding Layer 7 5xx error counter metrics

drodgers · September 7, 2021, 6:51am

Hi, I’m hoping to get some more insight on how retries, error logs and HTTP 5xx error metrics interact.

What I’m seeing is HTTP 5xx responses reported in the haproxy_frontend_http_responses_total and haproxy_backend_http_responses_total metrics. Some of these error responses from the metrics don’t correspond to errors reported in the logs, and some of them seem like they should be basically impossible given the aggressive retry config on their backends (eg. simple GET requests — not huge post requests which could exceed buffer limits — against a backend which proxies an S3 bucket with multiple reties and retry-on all-retryable-errors).

My main question is: do the frontend (or backend) *_http_response_* metrics actually always correspond to the final HTTP responses sent to active clients, or will they include records of other kinds of responses like:

Error responses from a backend which are then retried before being communicated with a client
Internal errors generated because a client disconnected part-way through a request (so the response was never fully sent to the client)

I’m really trying to answer the question “Are we serving up 5xx responses to our actual users?”, so if there’s another stat which can answer that more accurately, then I’d be keen to learn about it!

Important context:
Running HAProxy 2.4.3
Metrics extracted with haproxy-exporter 0.7.1 (not with the new built in prometheus endpoint - apologies if the metric names don’t exactly line up)
We log via syslog at the default level with: option dontlog-normal

Incidental question:
When we implemented a more aggressive retry config in HAProxy, our metrics for haproxy_backend_connection_errors_total dropped from a few per hour to (near) zero. I would naively have expected to still see errors here, even if the final request was retried successfully, but I’m guessing that’s this metric only records connection errors which affected the final response: is that right?

Thanks for any insights you can provide!

Topic		Replies	Views
Frontend session response codes different from backend Help!	0	555	March 23, 2022
We have a question regarding L7 retries and redispatch feature of HAProxy, as described in this post, published last year - https://www.haproxy.com/blog/haproxy-layer-7-retries-and-chaos-engineering/ Help!	0	347	September 11, 2020
High number of retries Help!	5	4938	March 7, 2018
Discrepency on total connection errors in stats page Help!	1	407	August 21, 2022
HTTP 503: haproxy is logging destination IP same as frontend IP Help!	1	33	July 30, 2024

Understanding Layer 7 5xx error counter metrics

Related topics