Old haproxy processes accepting new connections after reload


#1

Hey,

So I’ve read several questions regarding haproxy having old processes still running.
such as:
https://www.mail-archive.com/search?l=haproxy@formilux.org&q=subject:“Re%3A+HAProxy+graceful+restart+old+process+not+going+away”&o=newest&f=1

We are using haproxy version 1.5. According to the documentation a reload for the configration file using -sF options issues a -SIGUSR1 signal to the old processes . However only when I issue this signal explicitly the old processes gracefully exit.
Is sending this signal explicitly a bad practice ? Why isn’t it happening automatically ?

Further the main issue at hand here is ,that the old processes continue to accept new connections .
Apart from sending the -SIGUSR1 signal , explicitly when the reload script executed is/are there any alternative solution(s) ?


#2

Which 1.5 release exactly? Does the PID file contain the correct PID and is the file read and writable?


#3

Hey @lukastribus we are using HA-Proxy version 1.5-dev25-a339395 2014/05/10, and yes the PID file contains the correct PID and file is readable and writable.


#4

Is this a duplicate thread?

Anyway, the release you are using is not supported; its an ancient development build that you should not use in production. Please upgrade to latest 1.5 or 1.6 stable release.


#5

1.5.18, haven’t check about pid , will revert you soon for the following info


#6

Haproxy 1.6.8:
[root@drft001 ~]# haproxy -v
HA-Proxy version 1.6.8 2016/08/14
Copyright 2000-2016 Willy Tarreau willy@haproxy.org

I have the same issue, after
service haproxy reload
there are new processes with old one. So I restart the server instead of reloading to avoid this situation.


#7

Check the PID file. Does it contains the correct PID? Is the file writeable from haproxy?


#8

Hi,
I experiment the same behaviour.
I’m on 1.6.8 as well, and thought that upgrading from 1.4 or 1.5 could solve the issue. That was not the case.

Here is an example of process tree:

root     24115  0.0  0.0  46340  1824 ?        Ss   14:34   0:00 /usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy  27403  0.2  0.0  89272 20096 ?        S    14:49   0:00  \_ /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27366
haproxy  27450  1.2  0.0  89272 14380 ?        Rs   14:49   0:00  |   \_ /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27366
haproxy  27410  0.2  0.0  89272 16008 ?        S    14:49   0:00  \_ /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27366
haproxy  27458  1.2  0.0  89272 14392 ?        Ss   14:49   0:00  |   \_ /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27366
haproxy  27626  0.3  0.0  89272 16008 ?        S    14:49   0:00  \_ /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27623
haproxy  27674  1.1  0.0  89272 14380 ?        Ss   14:49   0:00  |   \_ /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27623
haproxy  27722  0.2  0.0  89272 16008 ?        S    14:49   0:00  \_ /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27716
haproxy  27762  1.0  0.0  89272 14368 ?        Ss   14:49   0:00  |   \_ /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27716

The problem is easily repoducible: just loop over reload, 50 times without sleep for example.

It happens when two reloads are performed in a small amount of time. As a result, there is no ‘back-reference’ in the ‘-sf’ of one haproxy instance to the previous one. It becomes “disconnected” from the others. It is also visible in journalctl output (2 haproxy instance has the same PID reference in ‘-sf’, resulting in one lost).

I had a look at haproxy-systemd-wrapper.c and guess that the PID file is only read and never written by this component.
To me it seems that a race condition happens and that several instances do not reference the previous one.

Restarting the server is very impacting as you may know, and this is why there was approach like the one in Yelp (https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html) consisting in letting the client do SYN-retries or buffering the SYNs while doing a full restart).
This becomes impossible in PaaS-like approach where many events occurs and may trigger reloads every seconds.

Maybe you’ve some insights to share before digging into that ?


#9

Please report this to the mailing list, they will be able to help you much better than here on discourse.

edit: Starting with snapshot 20161005 [1] and the unreleased v1.6.10 you will be able to disable SO_REUSEPORT in the configuration (no-reuseport). That could be something to try …

[1] http://www.haproxy.org/download/1.6/src/snapshot/haproxy-ss-20161005.tar.gz


#10

Thanks Lukas.
I just sent my question to the ML : https://www.mail-archive.com/haproxy@formilux.org/msg23867.html