Unix socket frontend quits accepting connections

colinm · March 23, 2017, 8:39pm

I ran into this issue on 1.6 and so upgraded to 1.7.3 and now it has occurred again. The socket file that HAProxy is supposed to listen on still exists but after some period of time HAProxy is no longer accepting connections from it and the process has to be restarted.

03:26:54 PM root@app1
~ # socat /dev/null UNIX:/var/run/magento/redis-cache.sock || echo DOWN
2017/03/23 15:27:09 socat[6583] E connect(6, AF=1 "/var/run/magento/redis-cache.sock", 35): Connection refused
DOWN
03:27:09 PM root@app1
~ # docker restart redis-cache.lb-2
redis-cache.lb-2
03:27:27 PM root@app1
~ # socat /dev/null UNIX:/var/run/magento/redis-cache.sock || echo DOWN
~ #

I don’t know how to reproduce exactly but in this case the process was only about a day old before I noticed it was no longer accepting connections.

Here is my full config:

global
  log 127.0.0.1 local0 notice

defaults
  mode tcp
  log global
  timeout client     1h
  timeout server     1h
  timeout connect 1000

listen stats
  bind *:1936
  mode http
  stats enable
  stats uri /
  stats refresh 60s
  stats show-node
  stats show-legends

frontend redis-unix
  bind /run/magento/redis-cache.sock mode 777
  default_backend redis

frontend redis-tcp
  bind *:6379
  default_backend redis

# The tcp-check will ensure that only reachable master nodes are considered up.
backend redis
  balance first
  option tcplog
  option tcp-check
  tcp-check send info\ replication\r\n
  tcp-check expect string role:master
  default-server inter 5000 downinter 5000 fastinter 1000 rise 2 fall 3 maxconn 256 maxqueue 128
  server node1 10.81.128.1:6379 check
  server node56 10.81.128.56:6379 check

lukastribus · March 23, 2017, 10:40pm

What’s in the log, and how what does the stats interface say, when in this situation?

colinm · March 24, 2017, 11:59pm

There is nothing notable in the logs:

 Proxy stats started.
 Proxy redis-unix started.
 Proxy redis-tcp started.
 Proxy redis started.
 Server redis/node56 is DOWN, reason: Layer7 timeout, info: " at step 2 of tcp-check (expect string 'role:master')", check duration: 5001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.

There is nothing after that… I know I loaded the stats page before but I didn’t looks specifically at the “redis-unix” section but that would obviously be something to look at next time… However, as this is an issue with a frontend and not a backend it is not “normal” behavior for a frontend to go down…

I’ll boost the log level and see what it shows if anything, but unfortunately I have no idea how long it will take to reproduce…

Thanks for the reply!

lukastribus · March 25, 2017, 1:14pm

Also check out the CPU load when in this condition. If haproxy uses all CPU, it would be a good idea to attach strace to the process.

strace -tt -p<haproxy PID>

colinm · March 27, 2017, 2:59pm

HI Lukas,

I have not observed that CPU usage is abnormal, there are just no connections accepted. I really appreciate you trying to help me with the issue but since I never received a response to my very detailed report on the mailing list and I do not have any more time to invest in debugging HAProxy I have decided to work instead on replacing HAProxy with other alternatives.

Thanks again!
Colin

lukastribus · March 27, 2017, 3:34pm

Well, you did not provide provide the requested logs when the issue occurs and on the mailing list your last post indicates that 1.7.2 works fine, so I’m not sure how we could’ve meet your expectations, but I certainly understand that you have to work towards a working setup.

Fact is however, unix sockets are a much used feature in haproxy, a major bug such as the one you described would affect a lot of users and that’s sounds unlikely especially since you had the same issue in haproxy 1.6.

Make sure you keep using unix socket with the new stack, so that you actually see whether this was a userspace problem (as opposed to kernel bugs, container malfuncitons, etc).

Thanks

colinm · March 27, 2017, 4:32pm

I reported that I had upgraded and was waiting to see if it happened again, not that it was fixed (if it was fixed I wouldn’t have bothered to report it). I’ll keep the instance running and check it periodically so if it re-occurs I can check the logs with the higher level of detail and report back.

I am using Docker so I suppose there could be some issue with the way the filesystem is mounted, so I installed socat in the container now so I can test from within the container next time (last time I was unable to add the socat package to the running container due to unknown issue with package manager at the time).

lukastribus · March 27, 2017, 5:22pm

Exactly, you said “just upgraded to 1.7.2 and started testing”, which implies that you would follow up on this with the result of your tests in 1.7.2. You never did, so I don’t see how the guys on the ML should’ve helped you? Instead you’ve now crossposted your issue here, which is not the way to do it.

Back to your issue, I’d suggest to install strace in your container as well. strace is the proper tool to troubleshoot what happens between the kernel and the application, when your socket doesn’t respond.

As I wrote earlier, attach strace to the process and try a socat access:
strace -tt -p<haproxy PID>

Then collect the output and post it.

colinm · April 5, 2017, 8:04pm

So I just noticed the issue happening again, this time on the proxy instance used for MySQL/Galera instead of the Redis one. As before, connections to the unix socket (actually both as there are two frontends in the MySQL proxy) are refused but connections to TCP port are accepted and work fine. Here is what I’ve gathered, I will keep it running for now in case you can think of something else to look for:

Screenshot of stats page: https://www.screencast.com/t/oXhh22uc

On the container:

# ls -al /run/magento/galera-*.sock
srwxrwxrwx 1 root root    0 Apr  5 12:32 galera-read.sock
srwxrwxrwx 1 root root    0 Apr  5 12:32 galera-write.sock
# stat /run/magento/galera-read.sock
  File: /run/magento/galera-read.sock
  Size: 0               Blocks: 0          IO Block: 4096   socket
Device: 13h/19d Inode: 2730        Links: 1
Access: (0777/srwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-04-05 12:32:51.344687784 +0000
Modify: 2017-04-05 12:32:51.344687784 +0000
Change: 2017-04-05 12:32:51.344687784 +0000
 Birth: -

ps aux on the container:

PID   USER     TIME   COMMAND
 6740 root       0:00 /usr/local/sbin/haproxy-master
 6741 root       2:27 /usr/local/sbin/haproxy -p /run/haproxy.pid -f /etc/haproxy.cfg -Ds -sf 6663

lsof on the container:

6740    /usr/local/sbin/haproxy pipe:[1878095]
6740    /usr/local/sbin/haproxy pipe:[1878096]
6740    /usr/local/sbin/haproxy pipe:[1878097]
6740    /usr/local/sbin/haproxy socket:[1872584]
6740    /usr/local/sbin/haproxy socket:[1872643]
6740    /usr/local/sbin/haproxy anon_inode:[eventpoll]
6740    /usr/local/sbin/haproxy socket:[4861265]
6741    /usr/local/sbin/haproxy anon_inode:[eventpoll]
6741    /usr/local/sbin/haproxy socket:[1872584]
6741    /usr/local/sbin/haproxy socket:[1872643]
6741    /usr/local/sbin/haproxy socket:[4861264]
6741    /usr/local/sbin/haproxy socket:[4861265]
6741    /usr/local/sbin/haproxy socket:[4861266]
6741    /usr/local/sbin/haproxy socket:[4861267]
6741    /usr/local/sbin/haproxy socket:[4861268]
6741    /usr/local/sbin/haproxy socket:[4861269]

The container is running Alpine so it looks like the lsof is from BusyBox which doesn’t tell much so I ran it also on the host (pids are different because of Docker PID mapping):

COMMAND     PID   TID             USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
haproxy   26294                   root  cwd       DIR               0,83       4096          2 /
haproxy   26294                   root  rtd       DIR               0,83       4096          2 /
haproxy   26294                   root  txt       REG               0,83    7322248         83 /usr/local/sbin/haproxy
haproxy   26294                   root  mem       REG               0,83                    89 /usr/lib/libpcre.so.1.2.7 (stat: No such file or directory)
haproxy   26294                   root  mem       REG               0,83                    87 /lib/libcrypto.so.1.0.0 (stat: No such file or directory)
haproxy   26294                   root  mem       REG               0,83                    84 /lib/libssl.so.1.0.0 (stat: No such file or directory)
haproxy   26294                   root  mem       REG               0,83                    73 /lib/libz.so.1.2.8 (stat: No such file or directory)
haproxy   26294                   root  mem       REG               0,83                    29 /lib/ld-musl-x86_64.so.1 (stat: No such file or directory)
haproxy   26294                   root  mem       REG               0,83                    93 /etc/localtime (path dev=8,3, inode=15349327)
haproxy   26294                   root    0r     FIFO               0,10        0t0    1878095 pipe
haproxy   26294                   root    1w     FIFO               0,10        0t0    1878096 pipe
haproxy   26294                   root    2w     FIFO               0,10        0t0    1878097 pipe
haproxy   26294                   root    3u     sock                0,8        0t0    1872584 protocol: NETLINK
haproxy   26294                   root    4u     sock                0,8        0t0    1872643 protocol: NETLINK
haproxy   26294                   root    5u  a_inode               0,11          0       8076 [eventpoll]
haproxy   26294                   root    8u     sock                0,8        0t0    4861265 protocol: UDP
haproxy   26295                   root  cwd       DIR               0,83       4096          2 /
haproxy   26295                   root  rtd       DIR               0,83       4096          2 /
haproxy   26295                   root  txt       REG               0,83    7322248         83 /usr/local/sbin/haproxy
haproxy   26295                   root  mem       REG               0,83                    89 /usr/lib/libpcre.so.1.2.7 (stat: No such file or directory)
haproxy   26295                   root  mem       REG               0,83                    87 /lib/libcrypto.so.1.0.0 (stat: No such file or directory)
haproxy   26295                   root  mem       REG               0,83                    84 /lib/libssl.so.1.0.0 (stat: No such file or directory)
haproxy   26295                   root  mem       REG               0,83                    73 /lib/libz.so.1.2.8 (stat: No such file or directory)
haproxy   26295                   root  mem       REG               0,83                    29 /lib/ld-musl-x86_64.so.1 (stat: No such file or directory)
haproxy   26295                   root  mem       REG               0,83                    93 /etc/localtime (path dev=8,3, inode=15349327)
haproxy   26295                   root    0u  a_inode               0,11          0       8076 [eventpoll]
haproxy   26295                   root    3u     sock                0,8        0t0    1872584 protocol: NETLINK
haproxy   26295                   root    4u     sock                0,8        0t0    1872643 protocol: NETLINK
haproxy   26295                   root    7u     sock                0,8        0t0    4861264 protocol: TCP
haproxy   26295                   root    8u     sock                0,8        0t0    4861265 protocol: UDP
haproxy   26295                   root    9u     sock                0,8        0t0    4861266 protocol: UNIX
haproxy   26295                   root   10u     sock                0,8        0t0    4861267 protocol: TCP
haproxy   26295                   root   11u     sock                0,8        0t0    4861268 protocol: UNIX
haproxy   26295                   root   12u     sock                0,8        0t0    4861269 protocol: TCP

So I am guessing that 4861266 and 4861268 are the galera-read.sock and galera-write.sock files in which case they appear to be open…

Here are the socket files and their inode numbers:

# ls -ial /run/magento/
total 4
2488 drwxr-xr-x 2 root root  140 Apr  5 19:32 .
  20 drwxr-xr-x 3 root root 4096 Apr  3 20:20 ..
2730 srwxrwxrwx 1 root root    0 Apr  5 12:32 galera-read.sock
2729 srwxrwxrwx 1 root root    0 Apr  5 12:32 galera-write.sock

Running this command from the host on the same directory shows the same inode numbers.

I can’t use strace from within the container without restarting it to add --cap-add SYS_PTRACE (I didn’t know this was needed until I tried it and got ‘Operation permitted error’. However, I can still run it from the host. The master process looks like this:

# strace -p26294
strace: Process 26294 attached
wait4(-1,

And nothing happens when I try to connect. The child process looks like this repeated over and over (there are three nodes to run healthchecks against):

# strace -p26295
strace: Process 26295 attached
epoll_pwait(0, [], 200, 2, NULL, 8)     = 0
epoll_pwait(0, [], 200, 2, NULL, 8)     = 0
epoll_pwait(0, [], 200, 0, NULL, 8)     = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 1
fcntl(1, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(1, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
connect(1, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("10.81.128.49")}, 16) = -1 EINPROGRESS (Operation now in prog
ress)
epoll_pwait(0, [], 200, 0, NULL, 8)     = 0
recvfrom(1, 0x55b2d6f6d074, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
getsockopt(1, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
sendto(1, "GET / HTTP/1.0\r\n\r\n", 18, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 18
epoll_ctl(0, EPOLL_CTL_ADD, 1, {EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}) = 0
epoll_pwait(0, [{EPOLLIN, {u32=1, u64=1}}], 200, 270, NULL, 8) = 1
recvfrom(1, "HTTP/1.0 200 OK\r\nDate: Wed, 05 A"..., 16384, 0, NULL, NULL) = 151
setsockopt(1, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
close(1)                                = 0
epoll_pwait(0, [], 200, 267, NULL, 8)   = 0
epoll_pwait(0, [], 200, 3, NULL, 8)     = 0
epoll_pwait(0, [], 200, 0, NULL, 8)     = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 1
fcntl(1, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(1, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
connect(1, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("10.81.128.79")}, 16) = -1 EINPROGRESS (Operation now in prog
ress)
epoll_pwait(0, [], 200, 0, NULL, 8)     = 0
recvfrom(1, 0x55b2d6f75cb4, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
getsockopt(1, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
sendto(1, "GET / HTTP/1.0\r\n\r\n", 18, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 18
epoll_ctl(0, EPOLL_CTL_ADD, 1, {EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}) = 0
epoll_pwait(0, [{EPOLLIN, {u32=1, u64=1}}], 200, 33, NULL, 8) = 1
recvfrom(1, "HTTP/1.0 200 OK\r\nDate: Wed, 05 A"..., 16384, 0, NULL, NULL) = 151
setsockopt(1, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
close(1)                                = 0
epoll_pwait(0, [], 200, 30, NULL, 8)    = 0
epoll_pwait(0, [], 200, 3, NULL, 8)     = 0
epoll_pwait(0, [], 200, 0, NULL, 8)     = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 1
fcntl(1, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
setsockopt(1, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(1, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
connect(1, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("10.81.128.107")}, 16) = -1 EINPROGRESS (Operation now in pro
gress)
epoll_pwait(0, [], 200, 0, NULL, 8)     = 0
recvfrom(1, 0x55b2d6f64434, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
getsockopt(1, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
sendto(1, "GET / HTTP/1.0\r\n\r\n", 18, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 18
epoll_ctl(0, EPOLL_CTL_ADD, 1, {EPOLLIN|EPOLLRDHUP, {u32=1, u64=1}}) = 0
epoll_pwait(0, [{EPOLLIN, {u32=1, u64=1}}], 200, 690, NULL, 8) = 1
recvfrom(1, "HTTP/1.0 200 OK\r\nDate: Wed, 05 A"..., 16384, 0, NULL, NULL) = 151
setsockopt(1, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
close(1)                                = 0

There is a lot of this output so instead of running the socat command just once I ran it about 20 times in a couple seconds and it didn’t seem to add any output to the strace stream when I did so. I also searched the stream for the socket filename and did not find it. In contrast when I ran strace on a working instance of HAProxy and the socat command to test the connection (successfully) here is the result:

epoll_pwait(0, [{EPOLLIN, {u32=9, u64=9}}], 200, 1000, NULL, 8) = 1
accept4(9, {sa_family=AF_LOCAL, NULL}, [2], SOCK_NONBLOCK) = 1
getsockname(1, {sa_family=AF_LOCAL, sun_path="/run/magento/redis-session.sock.59.tmp"}, [41]) = 0
getpid()                                = 63
accept4(9, 0x7ffcc950a700, 0x7ffcc950a6fc, SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(1, "", 15360, 0, NULL, NULL)   = 0
shutdown(1, SHUT_WR)                    = 0
close(1)                                = 0

I noticed that the instance that was broken had the ‘-sf’ flag which is expected since the config can be dynamically reloaded as instances are added/removed but when I intentionally triggered this on the working instances they continued to work so I still don’t know how to reproduce the issue.

Here is what the HAProxy logs look like:

docker-entrypoint.sh: Reloading config...
<5>haproxy-systemd-wrapper: re-executing on SIGHUP.
<7>haproxy-systemd-wrapper: executing /usr/local/sbin/haproxy -p /run/haproxy.pid -f /etc/haproxy.cfg -Ds -sf 24885
 Proxy stats started.
 Proxy galera-write-unix started.
 Proxy galera-write-tcp started.
 Proxy galera-read-unix started.
 Proxy galera-read-tcp started.
 Proxy galera-write started.
 Proxy galera-read started.
 Stopping proxy stats in 0 ms.
 Stopping frontend galera-write-unix in 0 ms.
 Stopping frontend galera-write-tcp in 0 ms.
 Stopping frontend galera-read-unix in 0 ms.
 Stopping frontend galera-read-tcp in 0 ms.
 Stopping backend galera-write in 0 ms.
 Stopping backend galera-read in 0 ms.
 Proxy stats stopped (FE: 0 conns, BE: 0 conns).
 Proxy galera-write-unix stopped (FE: 0 conns, BE: 0 conns).
 Proxy galera-write-tcp stopped (FE: 59 conns, BE: 0 conns).
 Proxy galera-read-unix stopped (FE: 0 conns, BE: 0 conns).
 Proxy galera-read-tcp stopped (FE: 0 conns, BE: 0 conns).
 Proxy galera-write stopped (FE: 0 conns, BE: 59 conns).
 Proxy galera-read stopped (FE: 0 conns, BE: 0 conns).
 Server galera-write/node14 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 2 active and
 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 Server galera-read/node14 is DOWN via galera-write/node14. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 rema
ining in queue.
 Server galera-write/node22 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 0ms. 1 active and
 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 Server galera-read/node22 is DOWN via galera-write/node22. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 rema
ining in queue.
 Server galera-write/node60 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms. 0 acti
ve and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
 backend galera-write has no server available!
 Server galera-read/node60 is DOWN via galera-write/node60. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 rema
ining in queue.
 backend galera-read has no server available!
 Server galera-write/node60 is UP, reason: Layer7 check passed, code: 200, info: "OK", check duration: 3ms. 1 active and 0 backup ser
vers online. 0 sessions requeued, 0 total in queue.
 Server galera-read/node60 is UP via galera-write/node60. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.

So in summary:

Proved using socat that the connection refused issue from the host is also an issue from within the container so I don’t see any reason to believe this is a Docker filesystem-related issue. Also the inode numbers match on the container and host.
HAProxy still didn’t log any errors and the stats page shows that HAProxy seems to think everything is working just fine.
Still no idea how to reproduce the issue other than wait a long time… Interestingly the other three instances of HAProxy with the exact same healthchecks are still running without the unix socket issue and on the instance that does have the unix socket issue it affects both frontends and not just one of the two.

That’s as much information as I know how to give but let me know if you can think of anything else to check up on and thanks again for the help.

Colin

lukastribus · April 5, 2017, 9:38pm

Can you show the output of:
ss -xlep

Also, please reconfirm the strace output with the “-tt” option, so we see the timestamp:
strace -ttp26295

colinm · April 5, 2017, 10:57pm

I never knew about ss.

 # ss -xlep
Netid State      Recv-Q Send-Q                 Local Address:Port                                  Peer Address:Port
u_str LISTEN     0      128    /run/magento/galera-read.sock.6740.tmp 4861268                                          * 0                     users:(("haproxy",pid=6741,fd=11))
 <->
u_str LISTEN     0      128    /run/magento/galera-write.sock.6740.tmp 4861266                                          * 0                     users:(("haproxy",pid=6741,fd=9))
 <->

Screenshot showing a full second of strace and shell with 6 connection attempts in the exact same second (timezones are different since one is host and one is container but the clock is still the same clock): https://www.screencast.com/t/hxzmccAk5q5x

lukastribus · April 6, 2017, 6:15am

I assume both sockets Send-Q is stuck at 128? And if you restart Haproxy, Send-Q slowly grows till it hits the 128 cap?

Can you:
cat /proc/sys/net/core/wmem_default cat /proc/sys/net/core/wmem_max

What application reads from the unix socket? Are there health checks reading from the unix socket? This sounds like as the application doesn’t read() everything, lets send-q slowly grow over time, until the kernel puts a stop to it.

By restarting haproxy, you reset the socket and its send-q, starting from scratch again.

Also, please provide (a instead of l argument):
sudo ss -xeap

colinm · April 9, 2017, 6:20am

The wmem_default and wmem_max on the host were both 212992. Those files did not exist inside the docker container but I know other settings like somaxconn inherits from the host.

The only application reading from the socket is my PHP application (Magento). However, the issue has occurred with both the MySQL proxy and Redis proxy which are separate instances. Also the issue pops up when there is really no traffic and it occurred a couple times before during a long period of inactivity (before I added the healthcheck so literally should have been zero connections for days).

I upgraded the container scheduler and unfortunately that restarted the container so now it is healthy again and currently all sockets are working so I can’t conduct any further tests until it happens again…

lukastribus · April 9, 2017, 6:33pm

So:
cat /proc/sys/net/core/somaxconn

fails within the container? Because from a quick google research it appears this should work from within the container.

Can you provide:

OS release version
exact kernel release (uname -a)
docker version

Please monitor the send-Q of the unix sockets with:
sudo ss -xeap

See if the send-Q is slowly growing over time, while the unix socket still works.

colinm · April 10, 2017, 8:48pm

No the /proc/sys/net/core/somaxconn file is there but /proc/sys/net/core/wmem_default and /proc/sys/net/core/wmem_max are not despite those latter two being on the host. The OS is Ubuntu 16.04 LTS on bare metal. Docker version is 17.03-ce.

uname -a: Linux admin1 4.4.0-71-generic #92-Ubuntu SMP Fri Mar 24 12:59:01 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

lukastribus · April 12, 2017, 8:25am

Is the send-Q growing?
sudo ss -xeap

colinm · April 12, 2017, 8:43pm

Currently the issue is not manifesting itself on any of the 3 ha proxy instances on any of the 4 nodes. The ss -xeap command does not show any of the haproxy sockets unless I actually have a client running and after the client (testing with socat) is closed the socket no longer appears with the ss command.

Topic		Replies	Views
Strange connection issues with tcp (redis) under load Help!	2	3563	February 8, 2019
Unix socket as backend Help!	6	3415	February 17, 2022
Debug HAProxy Socket Error Redis TLS Help!	11	69	January 31, 2025
HAproxy ssl offload for Redis Help!	2	1064	September 19, 2021
Resurrecting backend servers with health checks Help!	1	253	January 12, 2023

Unix socket frontend quits accepting connections

Related topics