Looking for stats data of "old" Haproxy process after reload

Hello,

I am running an HAProxy version 2.0.13-2ubuntu0.1.

When I perform a reload on HAProxy to load new configuration, I am aware, that this spawns a new process and therefore (some) stats are reset.

But for me, the old process keeps running, and especially keeps holding “already present” connections.

This brings me in the situation, where the stats page and the stats socket tell me, there are eg. 5 established connections, but in reality, there are >100 connections established in the “old” process.

Is there a way to:

  • Make the new process also showing the existing connections for the old one, or
  • Have the new process taking over already existing connections, or
  • somehow still access the stats socket for that old process?

I would be totally fine, gathering my data from multiple stats sockets, if due to a reload, multiple processes are running. But sadly I seem to only get one socket and only the data from the newest process.

My config:

# Managed by chef

global
  log 127.0.0.1:9001 local0 notice
  log 127.0.0.1:9001 local1 info
  maxconn 100000

  external-check
  spread-checks 5

  stats socket /var/run/haproxy/admin.sock mode 660 expose-fd listeners level admin
  stats socket /var/run/haproxy/haproxy.sock expose-fd listeners mode 666 level user
  stats timeout 30s
  user haproxy
  group haproxy
  daemon

  nbproc 1
  nbthread 4
  cpu-map 1/all 0-1

  tune.ssl.default-dh-param 2048
  tune.bufsize 524288

defaults
  log global
  option  dontlognull
  option  redispatch
  timeout connect 5s
  timeout client  600s
  timeout server  600s
  timeout check   250
  errorfile 400 /etc/haproxy/errors/400.http
  errorfile 403 /etc/haproxy/errors/403.http
  errorfile 408 /etc/haproxy/errors/408.http
  errorfile 500 /etc/haproxy/errors/500.http
  errorfile 502 /etc/haproxy/errors/502.http
  errorfile 503 /etc/haproxy/errors/503.http
  errorfile 504 /etc/haproxy/errors/504.http


listen stats # Define a listen section called "stats"
  bind 127.0.0.1:443  ssl crt /etc/ssl/private/certificate.pem alpn http/1.1
  bind 127.0.0.1:80
  mode http

  http-request redirect scheme https unless { ssl_fc }

  acl AUTH       http_auth(statsuser)
  stats http-request auth unless AUTH

  stats enable  # Enable stats page
  # stats hide-version  # Hide HAProxy version
  stats uri /  # Stats URI
  stats admin if TRUE
  log                  global
  option               httplog
  option               dontlognull
  maxconn              50

listen MyBackend
  mode tcp
  bind 10.0.0.18:188
  option tcplog
  timeout client 8h
  timeout server 8h

  option external-check
  external-check command /var/lib/haproxy/checks/myCheck.sh
  timeout check 500ms
  default-server inter 1s

  server node1.int 10.0.0.25:188 check

Thanks a lot in advance

If you use master/worker mode, you can use the Master CLI to access old processes:

http://cbonte.github.io/haproxy-dconv/2.4/management.html#9.4

show info and show stat for example are commands that are available for older processes as well.

Hey @lukastribus,

Good catch.
I tried that, but without success.

I am connecting to the master-socket and do a show proc to get all of the running worker nodes.
During a reload I also see two worker processes reported there.

But when I send the show stat to both processes (via @!<PID> show stat)
I get an empty response from the “old” process:

Getting stats from 12316
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,wrew,connect,reuse,cache_lookups,cache_hits,srv_icur,src_ilim,qtime_max,ctime_max,rtime_max,ttime_max,eint,idle_conn_cur,safe_conn_cur,used_conn_cur,need_conn_est,uweight,-,ssl_sess,ssl_reused_sess,ssl_failed_handshake,h2_headers_rcvd,h2_data_rcvd,h2_settings_rcvd,h2_rst_stream_rcvd,h2_goaway_rcvd,h2_detected_conn_protocol_errors,h2_detected_strm_protocol_errors,h2_rst_stream_resp,h2_goaway_resp,h2_open_connections,h2_backend_open_streams,h2_total_connections,h2_backend_total_streams,
stats,FRONTEND,,,0,0,2005,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,0,0,0,,,,0,0,0,0,0,0,,0,0,0,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,http,,0,0,0,0,0,0,0,,,0,0,,,,,,,0,,,,,,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

Getting stats from 10366
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,wrew,connect,reuse,cache_lookups,cache_hits,srv_icur,src_ilim,qtime_max,ctime_max,rtime_max,ttime_max,eint,idle_conn_cur,safe_conn_cur,used_conn_cur,need_conn_est,uweight,-,ssl_sess,ssl_reused_sess,ssl_failed_handshake,h2_headers_rcvd,h2_data_rcvd,h2_settings_rcvd,h2_rst_stream_rcvd,h2_goaway_rcvd,h2_detected_conn_protocol_errors,h2_detected_strm_protocol_errors,h2_rst_stream_resp,h2_goaway_resp,h2_open_connections,h2_backend_open_streams,h2_total_connections,h2_backend_total_streams,

The first process is the new worker process, while the second one is the old worker process.
During the reload, I kept a Connection open to the stats page (via telnet).
And before reload I am perfectly able to see the open session in the “stats” Frontend section.
But after the reload, on the new process it reports 0 and on the old process, the data is missing completely.

Thanks for linking the GitHub-issue.

Apparently I created it, after not finding any solution.

For people reading this in the future:

Apparently, this is an intended behaviour.
“old” Proxies are explicitly excluded from generating the statistics:

“old” proxies internally get the “disabled” flag and are therefore excluded.

As (at least in my opinion) not being able to access stats of old workers, that are still having open sessions, is a bad thing, with my limited knowledge, I will try and adjust the code there :slight_smile: