A few weeks ago we started noticing some very bad performance on HAProxy restart, this is most likely due to a configuration issue on our side. I am not sure what it is. Essentially we notices that CPU goes to 100% for almost a minute on reload and stays that way for about a minute during which most sites are not responsive.
We are currently using HAProxy 1.5.15 we are hosting about ~50 SSL sites
Our basic stats are:
pid = 27365 (process #1, nbproc = 1)
uptime = 0d 1h39m22s
system limits: memmax = unlimited; ulimit-n = 100196
maxsock = 100196; maxconn = 50000; maxpipes = 0
current conns = 14149; current pipes = 0/0; conn rate = 352/sec
Running tasks: 3/14430; idle = 19 %
I tried to dig in with strace and noticed:
/etc/haproxy/ssl# strace -c -p 27365
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
16.38 0.039534 2 23576 1002 sendto
12.39 0.029900 1 49156 13160 read
12.33 0.029753 2 12506 47 write
11.85 0.028598 2 14443 close
11.29 0.027246 3 9123 9113 connect
8.09 0.019521 1 16834 epoll_wait
6.68 0.016115 1 25885 10521 recvfrom
4.26 0.010281 1 18925 setsockopt
4.26 0.010270 1 16549 epoll_ctl
3.66 0.008839 2 5094 3 shutdown
3.62 0.008737 1 9112 socket
3.44 0.008313 1 9091 4219 accept4
1.62 0.003922 0 9112 fcntl
0.12 0.000285 0 655 getsockopt
0.02 0.000044 1 47 brk
------ ----------- ----------- --------- --------- ----------------
100.00 0.241358 220108 38065 total
This seems normalish … but this feels a bit fishy
#strace -p 23815 2> trace.log
# cat trace.log | wc
38761 298182 3304312
# cat trace.log | grep EAGAIN | wc
4075 41099 366647
So we are seeing a nice 10% EAGAIN going on, which feels a bit on the high side:
read(7097, 0x19e0a063, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(4697, 0xb48cc53, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(1755, 0x995eac3, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(7797, 0x11fc7693, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(7485, 0x16f89ea3, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(3150, 0x875e473, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(8061, 0x16f89ea3, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(4383, 0xae8b9e3, 5) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(13587, 0x157239a4, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
read(4930, 0x9043523, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(8938, 0x19fd0943, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(8718, 0xae8b9e3, 5) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(13380, 0x1c8184c4, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(10747, 0x21d08de4, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(15751, 0xa9dc2c4, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(7504, 0x25c75b4, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(509, 0x253db34, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(10997, 0x2579f74, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(5962, 0x259f2d4, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(8504, 0x25b3444, 16384, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
accept4(46, 0x7ffeef362560, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(1831, 0x39a6f83, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(1416, 0x994a8e3, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(1127, 0x3104fe60, 11) = -1 EAGAIN (Resource temporarily unavailable)
accept4(44, 0x7ffeef362560, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(15685, 0x3042793, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13465, 0xc18bf6b, 1177) = -1 EAGAIN (Resource temporarily unavailable)
accept4(45, 0x7ffeef362560, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(13541, 0x11fc7693, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(5578, 0x994a8e3, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(4417, 0x994a8e3, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(15750, 0x26cc5a53, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(601, 0x11fc7693, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(9746, 0xae8b9e3, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(5593, 0x16f9e173, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(1525, 0x11fc7693, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(2083, 0x994a8e3, 5) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(11021, 0x13b55954, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
read(2659, 0x23108db3, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(1806, 0x994a8e3, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(5717, 0xb48cc53, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(9391, 0x11fc7693, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(6639, 0xae8b9e3, 5) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(14023, 0x112bdb64, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(15683, 0x7b533c2, 4834, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
read(2535, 0xb48cc53, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(15519, 0x37aaa43, 5) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(15753, 0x87cc0a4, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(15721, 0xb4fd684, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
read(1620, 0x875e473, 5) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(15672, 0x295329ee, 7974, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(15675, 0x8dc2ed4, 8192, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
accept4(46, 0x7ffeef362560, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(3561, 0x1420d253, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(1353, 0x11fc7693, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(1015, 0x23108db0, 11) = -1 EAGAIN (Resource temporarily unavailable)
accept4(44, 0x7ffeef362560, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(10787, 0x9043523, 5) = -1 EAGAIN (Resource temporarily unavailable)
accept4(19, 0x7ffeef362560, [128], SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
read(7142, 0x9043523, 5) = -1 EAGAIN (Resource temporarily unavailable)
Our CPU is running at 40-70% constantly on reload it runs at 100% for almost a minute.
We run a Xeon E5606 @ 2.13GHz
How should we go about cutting down on the CPU?
Upgrade to 1.6.2?
Offload SSL?
Split up our HAProxy instance?