We are using haproxy version 1.5. According to the documentation a reload for the configration file using -sF options issues a -SIGUSR1 signal to the old processes . However only when I issue this signal explicitly the old processes gracefully exit.
Is sending this signal explicitly a bad practice ? Why isn’t it happening automatically ?
Further the main issue at hand here is ,that the old processes continue to accept new connections .
Apart from sending the -SIGUSR1 signal , explicitly when the reload script executed is/are there any alternative solution(s) ?
Hey @lukastribus we are using HA-Proxy version 1.5-dev25-a339395 2014/05/10, and yes the PID file contains the correct PID and file is readable and writable.
Anyway, the release you are using is not supported; its an ancient development build that you should not use in production. Please upgrade to latest 1.5 or 1.6 stable release.
I have the same issue, after
service haproxy reload
there are new processes with old one. So I restart the server instead of reloading to avoid this situation.
The problem is easily repoducible: just loop over reload, 50 times without sleep for example.
It happens when two reloads are performed in a small amount of time. As a result, there is no ‘back-reference’ in the ‘-sf’ of one haproxy instance to the previous one. It becomes “disconnected” from the others. It is also visible in journalctl output (2 haproxy instance has the same PID reference in ‘-sf’, resulting in one lost).
I had a look at haproxy-systemd-wrapper.c and guess that the PID file is only read and never written by this component.
To me it seems that a race condition happens and that several instances do not reference the previous one.
Restarting the server is very impacting as you may know, and this is why there was approach like the one in Yelp (https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html) consisting in letting the client do SYN-retries or buffering the SYNs while doing a full restart).
This becomes impossible in PaaS-like approach where many events occurs and may trigger reloads every seconds.
Maybe you’ve some insights to share before digging into that ?
Please report this to the mailing list, they will be able to help you much better than here on discourse.
edit: Starting with snapshot 20161005 [1] and the unreleased v1.6.10 you will be able to disable SO_REUSEPORT in the configuration (no-reuseport). That could be something to try …