100+ haproxy processes

Hi

We recently started seeing 100+ haproxy processes running. This is quite surprise and never saw this behaviour in the past. wondering what might caused this behaviour?

$ ps -ef | grep haproxy | wc -l
128

root 2888 21695 3 Feb11 ? 01:14:24 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 355 356 28 49 47 29 116 115 44 97 48 7 26 57 341 342 80 41 337 338 43 19 234 24 70 50 13 37 16 23 225 228 223 152 154 224 151 153 226 318 315 316 149 142 150 312 147 35 36 145 69 141 303 304 31 61 299 14 205 146 295 296 293 294 291 292 289 137 138 288 25 59 283 284 281 282 279 280 134 133 129 132 196 130 191 128 187 127 27 192 265 131 263 264 186 185 183 188 257 184 255 20 8 52 119
root 2889 21695 3 Feb11 ? 01:06:51 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 355 356 28 49 47 29 116 115 44 97 48 7 26 57 341 342 80 41 337 338 43 19 234 24 70 50 13 37 16 23 225 228 223 152 154 224 151 153 226 318 315 316 149 142 150 312 147 35 36 145 69 141 303 304 31 61 299 14 205 146 295 296 293 294 291 292 289 137 138 288 25 59 283 284 281 282 279 280 134 133 129 132 196 130 191 128 187 127 27 192 265 131 263 264 186 185 183 188 257 184 255 20 8 52 119
root 2929 21695 0 Feb06 ? 00:08:55 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 143 144 141 142 61 31 137 138 25 59 133 134 131 132 129 130 127 128 27 57 20 8 52 29 119 7 49 47 115 116 80 28 48 26 24 43 41 23 44 50 69 13 16 70 37 19 97 35 36 14
root 2930 21695 0 Feb06 ? 00:09:22 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 143 144 141 142 61 31 137 138 25 59 133 134 131 132 129 130 127 128 27 57 20 8 52 29 119 7 49 47 115 116 80 28 48 26 24 43 41 23 44 50 69 13 16 70 37 19 97 35 36 14
root 3423 21695 3 Feb02 ? 09:06:39 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 17 18 15 16 13 14 8 7
root 3424 21695 4 Feb02 ? 10:31:45 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 17 18 15 16 13 14 8 7
root 3894 21695 0 Feb08 ? 00:18:43 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 289 290 287 288 25 59 283 284 281 282 279 280 134 133 129 132 196 130 191 128 187 127 27 192 265 131 263 264 186 185 183 188 257 184 255 20 8 52 119 57 7 29 47 115 116 49 44 28 97 48 26 80 41 23 19 37 43 234 24 70 50 13 16 228 225 226 223 224 154 152 153 151 142 149 150 147 35 145 36 69 141 14 31 61 205 146 137 138

Don’t understand what is numbers attached to processes after -sf? never saw these numbers in our environment so far.

global
daemon
pidfile /var/run/haproxy-ssl.pid
log 127.0.0.1 local0 info
log 127.0.0.1 local1 notice
maxconn 20000
stats socket /var/run/haproxy-ssl.stat mode 600 level admin
stats timeout 2m
nbproc 2
ca-base /opt/haproxy/ssl/ca/
tune.ssl.default-dh-param 2048
ssl-default-bind-options no-sslv3
ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:!DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA

defaults
log global
stats enable
mode http
option forwardfor except 127.0.0.0/8
option dontlognull
option httplog clf
balance roundrobin
retries 3
maxconn 20000
timeout http-keep-alive 60s
timeout client 10m
timeout server 10m
timeout queue 1m
timeout connect 10s
timeout check 10s
timeout http-request 10s

   errorfile 503 /opt/haproxy/errors/503.html
   errorfile 400 /opt/haproxy/errors/400.html
   errorfile 403 /opt/haproxy/errors/403.html
   errorfile 500 /opt/haproxy/errors/500.html
   errorfile 502 /opt/haproxy/errors/502.html
   errorfile 504 /opt/haproxy/errors/504.html
   errorfile 408 /dev/null

frontend ssl_server
mode http
# Redirect routes from 80 to 443 (in case they try to come in on http://)
bind *:80
# Disable TLS v1.0 for CAE-2676
bind *:443 ssl no-tlsv10 crt /opt/haproxy/ssl/certs alpn h2,http/1.1
monitor-uri /proxy.html
log-format [%pid]\ [%Ts.%ms]\ %ac/%fc/%bc/%bq/%sc/%sq/%rc\ %Tq/%Tw/%Tc/%Tr/%Tt\ %tsc\ %ci:%cp\ %fi:%fp\ %si:%sp\ %ft\ %sslc\ %sslv\ %{+Q}r\ %ST\ %b:%s\ “%CC”\ “%hr”\ “%CS”\ “%hs”\ req_size=%U\ resp_size=%B
unique-id-format %{+X}o\ %ci:%cp_%fi:%fp_%Ts_%rt:%pid
unique-id-header X-Unique-ID

Redirect routes from 80 to 443 (in case they try to come in on http://)

redirect scheme https code 308 if !{ ssl_fc }

default_backend apache_rp

backend apache_rp
default-server maxconn 3200 inter 2s fall 1 rise 3
server RP_0 server1:443 check ssl verify none
server RP_1 server2:443 check ssl verify none
server RP_2 server3:443 check ssl verify none

listen stats
bind *:8000
mode http
stats enable
stats hide-versio

The only changes recently we made was to enable http 2 support and nbproc to 2

Out of 16 GB memory, HAPROXY consuming 12 GB and that is surprise.

$ ps -eo rss,command --sort -size | grep haproxy | awk ‘{ hr=$1/1024 ; sum +=hr} END {print sum}’
15352.9

any idea what is happening here ? really appreciate for any pointers

Have you restarted (manually or via a periodic script) the HAProxy service?

The PID’s you see after -sf are the list of HAProxy process IDs that are “waited for” after restarting HAProxy. These processes still exist because there are still live connections that are handled by them.

What HAProxy version are you using? (I used to encounter a bug with 1.5 that would make these processes never die.)

Thanks @ciprian.craciun

our version is

haproxy -V

HA-Proxy version 1.8.8 2018/04/19
Copyright 2000-2018 Willy Tarreau willy@haproxy.org

Yes we do reload HAPROXY process whenever we push SSL certs. In our environment, we have per app SSL certificates . automation job push the new cert and relaod. This has been doing last few years and never saw this behaviour

wondering http/2 or nbproc 2 settings have any impact to this behaviour? These are only changes we made in recent days

Are you sure you are issuing a reload instead of a restart? (I ask because to my knowledge “reloading” should happen in the same HAProxy process, meanwhile “restarting” actually starts new processes.)

Could you check if there are actual pending TCP connections? netstat -tnp | grep -F -e haproxy?

If you indeed see ESTABLISHED connections pertaining to those processes than this is the cause. If not it might be another issue that “snags” the HAProxy process.

ss -or state established sport = :443 | wc -l
193

we restarted but seems growing slowly

ps -C haproxy -f
UID PID PPID C STIME TTY TIME CMD
root 15929 15908 0 Jan31 ? 00:06:48 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 410 411 408 409 406 407 262 263 301 375 378 384 374 371
root 16859 15929 0 00:50 ? 00:02:53 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 372 373 263 371 262 301
root 16860 15929 0 00:50 ? 00:03:24 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 372 373 263 371 262 301
root 50327 15929 1 20:05 ? 00:00:14 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 410 411 408 409 406 407 262 263 301 375 378 384 374 371
root 50328 15929 1 20:05 ? 00:00:18 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 410 411 408 409 406 407 262 263 301 375 378 384 374 371
root 53138 15929 0 Feb10 ? 00:05:01 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 298 299 262 263
root 54807 15929 0 Feb12 ? 00:01:42 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 368 369 263 301 262
root 56120 15929 0 Feb08 ? 00:00:42 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 260 261
root 56121 15929 0 Feb08 ? 00:00:42 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 260 261
root 56418 15929 0 05:34 ? 00:00:13 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 376 377 374 375 301 262 263 371
root 63176 15929 0 05:54 ? 00:00:14 haproxy -W -db -f /usr/local/etc/haproxy/haproxy.cfg -sf 382 383 380 381 378 379 263 371 374 375 301 262

sudo netstat -tp | grep haproxy | grep ESTABLISHED | wc -l
223

cat /etc/systemd/system/multi-user.target.wants/haproxy-443.service
[Unit]
Description=HAProxy 443 Container
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
Restart=always
ExecStartPre=-/usr/bin/docker stop %n
ExecStartPre=-/usr/bin/docker rm %n
ExecStartPre=-/usr/bin/docker pull platform/haproxy:latest
ExecStart=/usr/bin/docker run --rm --name %n --cpus 1.5 --cpu-shares 3072 --network=host -v “/opt/haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:z” -v “/opt/haproxy/ssl/certs:/opt/haproxy/ssl/certs” platform/haproxy:latest

ExecReload=/usr/bin/docker kill -s HUP %n

[Install]
WantedBy=multi-user.target

automation use systemctl reload haproxy-443 afrter pushing new cert to reload.

This is not haproxy you are reloading or restarting, but a docker container (containing haproxy). The above systemd unit file is not something that ships with haproxy, but that you got from somewhere else.

I suggest you contact whoever provided this this docker container/systemd unit file.

That said:

  • if you have obsolete processes around you need to kill 'em. You are not going to fix anything by reloading/restarting. Whether that’s nginx/apache/php or haproxy, you will have to kill old processes.
  • When using haproxy itself (as opposed to this case with docker containers)
    • a reload is starting a new process(es) and the old process will keep running to handle the existing connections (and quit when all those connections are closed)
    • a restart is starting new process(es) and the old process is killed, interrupting existing sessions which have to be reestablished
  • Haproxy 1.8.8 has as of today 185 bugs that are fixed in subsequent releases so you may want to consider upgrading

That number will not start to decrease until either you manually kill the HAProxy processes, or the connections are closed (by HAProxy or the client itself).

Might it be the case, that those connections are actually “dead” (i.e. the client is not around anymore, but no FIN/RST packet was sent/received)?

Perhaps configure:

May i ask what’s the difference to:

haproxy -v
HA-Proxy version 1.8.8-1ubuntu0.3 2019/01/11
Copyright 2000-2018 Willy Tarreau 

which is in the default Ubuntu Repos (18.04)?

The difference are 5 backported security fixes:
http://changelogs.ubuntu.com/changelogs/pool/main/h/haproxy/haproxy_1.8.8-1ubuntu0.3/changelog

Still leaving 180 non-security related bugs unfixed.

1 Like

I’ve seen this as well, when restarting haproxy with -sf. The strange thing is that sometimes the haproxy processes seem to continue to persist even after all connections have terminated.

hard-stop-after is supposed to care of all of those cases. However, configuring timeouts correctly (so connections actually close) and making sure to use current stable releases (to avoid bugs fixed a long time ago) is also a very important.

sorry for late reply…

These are current configured timeouts

    log     global
    stats   enable
    mode    http
    option forwardfor except 127.0.0.0/8
    option  dontlognull
    option  httplog clf
    balance roundrobin
    retries 3
    maxconn 20000
    timeout http-keep-alive 60s
    timeout client 10m
    timeout server 10m
    timeout  queue 1m
    timeout connect 10s
    timeout check 10s
    timeout http-request 10s

Issue started again after last week full restart of haproxy instance.

$ netstat -ant | awk ‘/ESTABLISHED|LISTEN|CLOSE_WAIT|TIME_WAIT/ {print $6}’ | sort | uniq -c | sort -n
33 LISTEN
773 CLOSE_WAIT
7156 TIME_WAIT
7378 ESTABLISHED

So which of the suggestion already given did you follow? Are you on 1.8.19 now? Did you configure hard-stop-after?

Like I said, you are not using the systemd unit file that ships with haproxy, but you are using docker, with a docker provided systemd unit file. You are not showing how haproxy is actually reloaded, so I’m not sure how to help you.

@lukastribus : After disabling http/2, this issue fixed. at least we are able to identify the issue. Now question is how to fix since we can’t disable http/2 as we committed it.

As you said we have 2 options 1) Upgrade to latest 1.8.19 or 2) use hard-stop-after.

we are thinking to go with option 1st and upgrade our 1.8.8 to 1.8.19. hope that fix the issue.

one side question, what are side affects of hard-stop-after?

To upgrade more easily to the current release I suggest you use Vincent Bernat repository, you can read about it here:

https://haproxy.debian.net/

If 1.8.19 still has those issues, we need you to report them back here (you can also file a bug on the issue tracker on github).

hard-stop-after will forcefully close active sessions after the timeout you specify. So if your customers have long running downloads (or uploads), and your hard-stop-after value is to short for them to complete the transaction, and you actually reload haproxy when such a long transfer occurs, then those will be closed by haproxy. The customers will then see failed downloads/uploads. But the value for hard-stop-after should be large in every case. I’d suggest at least 10 minutes (but it depends on whether you site manages long transfers up or down).