About metrics resolution

Heyo,

based on https://www.haproxy.com/fr/blog/haproxy-exposes-a-prometheus-metrics-endpoint/

I added a stats frontend. It appeared to me that there was no documentation about this endpoint in here.

So, my question is: since the “old” metrics endpoints raises time metrics in milliseconds, why does this endpoint has “seconds” at the highest resolution?

e.g.

haproxy_backend_check_last_change_seconds
haproxy_backend_downtime_seconds_total
haproxy_backend_http_connect_time_average_seconds
haproxy_backend_http_queue_time_average_seconds
haproxy_backend_http_response_time_average_seconds
haproxy_backend_http_total_time_average_seconds
haproxy_backend_last_session_seconds
haproxy_process_start_time_seconds 
haproxy_server_check_last_change_seconds
haproxy_server_downtime_seconds_total
haproxy_server_http_connect_time_average_seconds
haproxy_server_http_queue_time_average_seconds
haproxy_server_http_response_time_average_seconds
haproxy_server_http_total_time_average_seconds
haproxy_server_last_session_seconds

Documentation of the HAProxy exporter is here.

You mentioned an “old endpoint” ? I guess you make a reference to this exporter. AFAIK, the resolution is the same for both. In fact, these exporters rely on the HAProxy stats. So, most of time, they use the same units than the raw stats.

Thanks for the ref!

Nah, I ditched this exporter also to scrape directly (via a wrapper) the old stats socket which is bound to each process of my instance. When you do a “show stat” on that socket, it throws you milliseconds values

Ok, I checked, and for http average times, it is a naming problem. Names of the old exporter were reused (the one provided by prometheus). But the resolution is the millisecond. So bad naming here.

For others (check_last_change, downtime, start_time, last_session), the resolution is the second on purpose.

Thanks for checking, I’ll get this sucker running more widely then

I found another naming fail on the metrics, on the frontend_bytes sent/received, it is displayed as “bytes” but it doesn’t look like bytes :thinking:

*_bytes_in_total and *_bytes_out_total are in bytes. Why do you think the unit is wrong ?

A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase or be reset to zero on restart. For example, you can use a counter to represent the number of requests served, tasks completed, or errors.

If I just render what is scraped:

it doesn’t look cumulative

I don’t know how it is rendered in Prometheus but these values are cumulative. Note that the stats are not preserved after a reload. Maybe it could explain what you observe.

I just made a query haproxy_frontend_bytes_in_total, there is no reload at all during this period. I have a custom scraper who does a echo show stat | nc -u /var/run/haproxy_stat_socket.sock and parses the output. That socket shows me a 0 value when I effectively reload my instance. There was no reload during the observed time period (which is 1hr). Considering the graph’s pattern, it seems that the counter reaches a max_value of some sort and is reset to 0.

bytes_in and bytes_out are 64-bits unsigned integer. So an overflow is possible but unlikely in your case. I don’t know if there is a way to set a max value in Prometheus. But, on HAProxy side, if you have no reload/restart at all, you can also reset all counters sending the command clear counters all on the stats socket. Any chance to have such resets ?

BTW I suggest you to test it by hand, requesting the stats several time on the stats socket or the prometheus exporter to remove all intermediaries.

Here is the output:

 while : ; do curl -sSL myserver:8404/metrics | rg 'frontend_bytes_in_total\{proxy="myfrontend"\}' ; sleep 1 ; done
haproxy_frontend_bytes_in_total{proxy="myfrontend"} 46828349695
haproxy_frontend_bytes_in_total{proxy="myfrontend"} 14284870545
haproxy_frontend_bytes_in_total{proxy="myfrontend"} 46836447611
haproxy_frontend_bytes_in_total{proxy="myfrontend"} 34925352235

in 5 seconds

I run in multiprocess, but according to the value dispersion, it doesn’t even make sense regarding a multiprocess haproxy environment.

[EDIT]

Sorry, I forgot about replying to clear counters all: nope, I never use it. As I said, I already run a home-made scraper who returns coherent values. Those values are reset when I reload my haproxy configuration, otherwise they are not.

If you run HAProxy in multiprocess mode, your HTTP requests are probably load-balanced between the different processes. The stats are not shared between processes. So the value will differ between processes. Try to read the value of haproxy_process_relative_process_id to know where a request was processed.

In nbproc, you need to get stats from each process individually and aggregate the results. Prometheus can handle the aggregation, so you need to update your HAProxy configuration to have a listen socket bound on each process. Another solution is to use threads instead of processes. Note that the same is true for the stats socket.

thanks for the hint, I’ll try this out. I figured it’d be already aggregated. I had to do it in my custom made to expose already aggregated metrics.