Seamless Reloads don't work with systemd


#1

I’m trying to make HAProxy 1.8.3 work on CentOS 7 and I want to enable seamless reloads. I’ve managed to make seamless reloads work if I run HAProxy in the shell, but it doesn’t want to work with systemd.

Here is the HAProxy version:

# haproxy -v
HA-Proxy version 1.8.3-205f675 2017/12/30
Copyright 2000-2017 Willy Tarreau <willy@haproxy.org>

Systemd unit file /usr/lib/systemd/system/haproxy.service:

[Unit]
Description=HAProxy Load Balancer
After=syslog.target network.target

[Service]
EnvironmentFile=/etc/sysconfig/haproxy
Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid"
ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
ExecStart=/usr/sbin/haproxy -Ws -f $CONFIG -p $PIDFILE $OPTIONS
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
ExecReload=/bin/kill -USR2 $MAINPID
KillMode=mixed
Restart=always
Type=notify

[Install]
WantedBy=multi-user.target

and the global section of the /etc/haproxy/haproxy.cfg looks like this:

global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon
    stats socket /var/lib/haproxy/stats expose-fd listeners

When I start haproxy everything looks fine:

[root@hap18 ~]# systemctl start haproxy.service 
[root@hap18 ~]# systemctl status haproxy.service 
● haproxy.service - HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-01-05 14:53:12 CET; 2s ago
  Process: 4004 ExecReload=/bin/kill -USR2 $MAINPID (code=exited, status=0/SUCCESS)
  Process: 4003 ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q (code=exited, status=0/SUCCESS)
  Process: 4038 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q (code=exited, status=0/SUCCESS)
 Main PID: 4039 (haproxy)
   CGroup: /system.slice/haproxy.service
           ├─4039 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
           └─4041 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Jan 05 14:53:12 hap18 systemd[1]: Starting HAProxy Load Balancer...
Jan 05 14:53:12 hap18 systemd[1]: Started HAProxy Load Balancer.

[root@hap18 ~]# ps auxf | grep haprox[y]
root      4039  0.0  0.2  76296  4344 ?        Ss   14:53   0:00 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy   4041  8.5  0.1 297808  2280 ?        Ssl  14:53   0:00  \_ /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

then when I do a reload, this happens:

[root@hap18 ~]# systemctl reload haproxy.service 
[root@hap18 ~]# systemctl status haproxy.service 
● haproxy.service - HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; vendor preset: disabled)
   Active: active (running) since Fri 2018-01-05 14:53:12 CET; 46s ago
  Process: 4058 ExecReload=/bin/kill -USR2 $MAINPID (code=exited, status=0/SUCCESS)
  Process: 4057 ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q (code=exited, status=0/SUCCESS)
  Process: 4038 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q (code=exited, status=0/SUCCESS)
 Main PID: 4039 (haproxy)
   CGroup: /system.slice/haproxy.service
           ├─4039 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 4041 -x /var/lib/haproxy/stats
           └─4060 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 4041 -x /var/lib/haproxy/stats

Jan 05 14:53:12 hap18 systemd[1]: Starting HAProxy Load Balancer...
Jan 05 14:53:12 hap18 systemd[1]: Started HAProxy Load Balancer.
Jan 05 14:53:56 hap18 systemd[1]: Reloaded HAProxy Load Balancer.
Jan 05 14:53:56 hap18 haproxy[4039]: [WARNING] 004/145312 (4039) : Reexecuting Master process
Jan 05 14:53:56 hap18 haproxy[4039]: [WARNING] 004/145356 (4039) : Failed to connect to the old process socket '/var/lib/haproxy/stats'
Jan 05 14:53:56 hap18 haproxy[4039]: [ALERT] 004/145356 (4039) : Failed to get the sockets from the old process!
Jan 05 14:53:56 hap18 haproxy[4039]: [WARNING] 004/145356 (4039) : Former worker 4041 exited with code 0

[root@hap18 ~]# ps auxf | grep haprox[y]
root      4039  0.0  0.2  76296  4352 ?        Ss   14:53   0:00 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 4041 -x /var/lib/haproxy/stats
haproxy   4060 19.7  0.1 297808  2280 ?        Ssl  14:53   0:00  \_ /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 4041 -x /var/lib/haproxy/stats

as you can see HAProxy logs that it cannot connect to the socket:

Jan 05 14:53:56 hap18 haproxy[4039]: [WARNING] 004/145312 (4039) : Reexecuting Master process
Jan 05 14:53:56 hap18 haproxy[4039]: [WARNING] 004/145356 (4039) : Failed to connect to the old process socket '/var/lib/haproxy/stats'
Jan 05 14:53:56 hap18 haproxy[4039]: [ALERT] 004/145356 (4039) : Failed to get the sockets from the old process!
Jan 05 14:53:56 hap18 haproxy[4039]: [WARNING] 004/145356 (4039) : Former worker 4041 exited with code 0

but it does everything else correctly because we can see that it spawned a new child process with adding -sf 4041 -x /var/lib/haproxy/stats where 4041 is the PID of the old process.

I tried many things:

  • removed chroot
  • set user user/group to root
  • changed socket path
  • changed socket permissions

but nothing helped.

The interesting thing is that if I run HAProxy from the shell, reload works (I just had to disable daemon mode in the config so that HAProxy logs everything into the console, and of course while working with systemd I tried with disabling daemon mode as well). Here are the steps to run it in the shell.

Run haproxy in tmux/screen:

[root@hap18 ~]# haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

then in a spearate session we list these processes:

[root@hap18 ~]# ps auxf | grep haprox[y]
root      4185  0.0  0.2  76296  4348 pts/1    S+   15:06   0:00      \_ haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy   4186  0.0  0.1  76612  2024 ?        Ss   15:06   0:00          \_ haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Now we send that process the USR2 signal so that it starts the reload:

kill -USR2 $(pgrep -U 0 haproxy)

pgrep -U 0 haproxy finds the master HAProxy process.
Then if we go back to the tmux/screen session we can see this:

[root@hap18 ~]# haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
[WARNING] 004/150638 (4185) : Reexecuting Master process
[WARNING] 004/150741 (4185) : Former worker 4186 exited with code 0

If we list the haproxy processes again:

[root@hap18 ~]# ps auxf | grep haprox[y]
root      4185  0.0  0.2  76296  4348 pts/1    S+   15:06   0:00      \_ haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 4186 -x /var/lib/haproxy/stats
haproxy   4190  0.0  0.1  76612  2028 ?        Ss   15:07   0:00          \_ haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 4186 -x /var/lib/haproxy/stats

We can see that the reload was done and there weren’t any errors about not being able to read the socket.

Any ideas what could be the reason behind this? It has to be something related to systemd, but I’m not sure what could it be.


#2

Yes, we already have a report about this on the mailing list and are currently troubleshooting:
https://www.mail-archive.com/haproxy@formilux.org/msg28585.html


#3

Are you using nbthread? Can you confirm that it works without nbthread?


#4

I was using nbthread, but I tried before (and now again) without nbthread and still the same issue.


#5

Have you managed to find what is the problem? I noticed in the mail thread that the conclusion that the problem is with nbthread parameter, but that is not the case for me. Can I be of some help so that you can find what the problem is easier?


#6

Not yet, William is analyzing this.

For the record: when you remove the nbthread parameter, you did kill haproxy completely, you didn’t reload from a nbthread config to a config without nbthread, right?

Also can you share the output of haproxy -vv please.


#7

Yeah of course, I did the full systemctl stop haproxy so that I have a clean state.

Here is the output of haproxy -vv:

# haproxy -vv
HA-Proxy version 1.8.3-205f675 2017/12/30
Copyright 2000-2017 Willy Tarreau <willy@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label
  OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
        [SPOE] spoe
        [COMP] compression
        [TRACE] trace

Here is the whole haproxy.cfg which I’m using for testing:

global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon
    stats socket /var/lib/haproxy/stats expose-fd listeners

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

frontend main
    bind *:5000
    acl url_static       path_beg       -i /static /images /javascript /stylesheets
    acl url_static       path_end       -i .jpg .gif .png .css .js

    use_backend static          if url_static
    default_backend             app

backend static
    balance     roundrobin
    server      static 127.0.0.1:4331 check

backend app
    balance     roundrobin
    server  app1 127.0.0.1:5001 check
    server  app2 127.0.0.1:5002 check
    server  app3 127.0.0.1:5003 check
    server  app4 127.0.0.1:5004 check

#8

Not sure how I can continue the discussion on the mailing list if I’m not on the mailing list, so I’m posting here.

I build a package with the yesterday’s nighthly:

[root@c7 ~]# haproxy -v
HA-Proxy version 1.8.3-bbedf00 2018/01/24
Copyright 2000-2017 Willy Tarreau <willy@haproxy.org>

and it’s still the same issue:

[root@c7 ~]# systemctl stop haproxy.service 
[root@c7 ~]# systemctl start haproxy.service 
[root@c7 ~]# systemctl reload haproxy.service 
[root@c7 ~]# systemctl status haproxy.service 
● haproxy.service - HAProxy Load Balancer
   Loaded: loaded (/usr/lib/systemd/system/haproxy.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2018-01-25 11:13:50 CET; 8s ago
  Process: 20890 ExecReload=/bin/kill -USR2 $MAINPID (code=exited, status=0/SUCCESS)
  Process: 20889 ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q -x /var/lib/haproxy/stats (code=exited, status=0/SUCCESS)
  Process: 20877 ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q (code=exited, status=0/SUCCESS)
 Main PID: 20878 (haproxy)
   CGroup: /system.slice/haproxy.service
           ├─20878 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 20880 -x /var/lib/haproxy/stats
           └─20892 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -sf 20880 -x /var/lib/haproxy/stats

Jan 25 11:13:50 c7 systemd[1]: Starting HAProxy Load Balancer...
Jan 25 11:13:50 c7 systemd[1]: Started HAProxy Load Balancer.
Jan 25 11:13:53 c7 systemd[1]: Reloaded HAProxy Load Balancer.
Jan 25 11:13:53 c7 haproxy[20878]: [WARNING] 024/111350 (20878) : Reexecuting Master process
Jan 25 11:13:53 c7 haproxy[20878]: [WARNING] 024/111353 (20878) : Failed to connect to the old process socket '/var/lib/haproxy/stats'
Jan 25 11:13:53 c7 haproxy[20878]: [ALERT] 024/111353 (20878) : Failed to get the sockets from the old process!
Jan 25 11:13:53 c7 haproxy[20878]: [WARNING] 024/111353 (20878) : Former worker 20880 exited with code 0

And it’s the same with, or without nbthread in the config file.


#9

Can you remove $OPTIONS from your init file, this is not needed as haproxy will detect this from the configuration and I’m unsure what happens when we hit this code path twice.


#10

$OPTIONS was set to an empty string, so it shouldn’t make a change, but I still removed it and it’s exactly the same issue.


#11

Because of this topic I’ve made a lot of tests with 1.8.3+systemd, checking not just logs but real usage with background Apache Benchmark
No problems except very specific with slow SSL

Logs for same stop-start-reload sequence:

Jan 25 16:51:39 TEST systemd[1]: Started HAProxy Load Balancer.
Jan 25 16:51:45 TEST systemd[1]: Reloading HAProxy Load Balancer.
Jan 25 16:51:46 TEST haproxy[11099]: [WARNING] 024/165139 (11099) : Reexecuting Master process
Jan 25 16:51:46 TEST systemd[1]: Reloaded HAProxy Load Balancer.
Jan 25 16:51:48 TEST haproxy[11099]: [WARNING] 024/165147 (11099) : Former worker 11103 exited with code 0

It should be some personal details of your build- or OS-configuration. I use slightly different libs and systemd 215-17+deb8u7


#12

@happoy right, your issue is different from this one.

@kustodian a few pointers:

  • check you are using the latest systemd package (provide the output of yum info systemd)
  • provide the ls output of the stats socket (ls -l /var/lib/haproxy/stats)
  • install strace and attach it to the to the main PID (from your systemctl status output) before the reload and provide the output: strace -tt -p<PID>

#13

As far as I can see it is the latest version:

[root@c7 ~]# yum info systemd
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.t-home.mk
 * epel: mirror.de.leaseweb.net
 * extras: mirror.t-home.mk
 * ius: mirror.ehv.weppel.nl
 * updates: mirror.t-home.mk
Installed Packages
Name        : systemd
Arch        : x86_64
Version     : 219
Release     : 42.el7_4.4
Size        : 21 M
Repo        : installed
From repo   : updates
Summary     : A System and Service Manager
URL         : http://www.freedesktop.org/wiki/Software/systemd
License     : LGPLv2+ and MIT and GPLv2+
Description : systemd is a system and service manager for Linux, compatible with
            : SysV and LSB init scripts. systemd provides aggressive parallelization
            : capabilities, uses socket and D-Bus activation for starting services,
            : offers on-demand starting of daemons, keeps track of processes using
            : Linux cgroups, supports snapshotting and restoring of the system
            : state, maintains mount and automount points and implements an
            : elaborate transactional dependency-based service control logic. It can
            : work as a drop-in replacement for sysvinit.

[root@c7 ~]# ls -l /var/lib/haproxy/stats
srwxr-xr-x. 1 root root 0 Jan 25 12:02 /var/lib/haproxy/stats

Here is the output of strace https://pastebin.com/JdLW145i. The line which looks to be the issue is:

23:03:41.164966 connect(4, {sa_family=AF_LOCAL, sun_path="/var/lib/haproxy/stats"}, 110) = -1 EACCES (Permission denied)

But I’m not sure how is that possible? When the socket file is actually readable by the root user, when we look at permissions, and even if the permissions are bad root can always read all files. I thought that it might be the chroot option, but it’s the same when chroot is commented out as well.


#14

Is this fixed in 1.8.8? I just tried it on 1.8.8 and when I do a reload the message in systemd status looks like this:

Apr 20 23:25:17 hap18 systemd[1]: Reloaded HAProxy Load Balancer.
Apr 20 23:25:17 hap18 haproxy[550]: [WARNING] 109/232516 (550) : Former worker 551 exited with code 0

There is no message Failed to get the sockets from the old process, but the thing is that even if I delete expose-fd listeners from the stats socket and do a restart then reload, the message is absolutely the same. I’m not sure if seamless reloads work now even without expose-fd listeners, or the message was just removed?


#15

I still don’t know what the root cause of your issue was, but the error message has not been removed, and expose-fd listeners is still necessary for seamless reloads.

Your system did not let you access the socket previously, I have no idea why that failed. Maybe some threading related bug affected you that have been fixed in the meantime.


#16

My reload is not working also, take a look:

  • version
[root@santorini1 haproxy]# haproxy -vv
HA-Proxy version 1.8.8 2018/04/19
Copyright 2000-2018 Willy Tarreau <willy@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = native
  CC      = gcc
  CFLAGS  = -O2 -march=native -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -fno-strict-overflow -Wno-unused-label
  OPTIONS = USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1 USE_LIBCRYPT=1 USE_THREAD=1 USE_OPENSSL=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built without PCRE or PCRE2 support (using libc's regex instead)
Built without compression support (neither USE_ZLIB nor USE_SLZ are set).
Compression algorithms supported : identity("identity")
Built with network namespace support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
	[SPOE] spoe
	[COMP] compression
	[TRACE] trace
  • socket:
[root@santorini1 haproxy]# grep stats /etc/haproxy/haproxy.cfg
    stats socket /var/run/haproxy.sock mode 777 level admin expose-fd listeners
    stats bind-process 1
    stats timeout 2m
  • reload
[root@santorini1 haproxy]# /usr/local/sbin/haproxy -c -f /etc/haproxy/haproxy.cfg -x /var/run/haproxy.sock 
Configuration file is valid

[root@santorini1 haproxy]# service haproxy reload
Redirecting to /bin/systemctl reload haproxy.service
  • systemd
[root@santorini1 haproxy]# cat /etc/systemd/system/haproxy.service
[Unit]
Description=HAProxy Load Balancer
# allows us to do millisecond level restarts without triggering alert in Systemd
StartLimitInterval=0
StartLimitBurst=0
After=network.target

[Service]
Environment="CONFIG=/etc/haproxy/haproxy.cfg" "PIDFILE=/run/haproxy.pid" "SOCKET=/var/run/haproxy.sock"
ExecStartPre=/usr/local/sbin/haproxy -f $CONFIG -c -q
ExecStart=/usr/local/sbin/haproxy -W -f $CONFIG -p $PIDFILE -D

# Zero downtime reloads using socket
ExecReload=/usr/local/sbin/haproxy -f $CONFIG -c -q -x $SOCKET
#ExecReload=/bin/kill -USR2 $MAINPID

KillMode=mixed
Restart=always
Type=forking

[Install]
WantedBy=multi-user.target

#17

@tiagocruz

You are not using the recommended systemd unit file and haproxy options for seamless reloads.

Use the unit file shipped with haproxy contrib/systemd/haproxy.service.in. Don’t change anything. This requires haproxy to be compiled with the USE_SYSTEMD make argument.


#18

awesome man! now it works like a charm \o/


#19

Thanks, I was wondering why seamless reload wasn’t working on my test server. I’m using Centos7 with HAProxy 1.8 from software collections repo, this version doesn’t seem to be built with USE_SYSTEMD argument either.


#20

In that case, instead of -Ws just use -W and instead of Type=notify use Type=forking in the systemd unit file. Seamless reload should work even without systemd notify support. However it is important that the rest of the systemd file is correct.