503 with NOSRV but no tcp reset found


#1

haproxy: 1.7.6

I found in haproxy log that occasional 503 error, and I tried to tcpdump traffic from haproxy node,
but I did not find any tcp reset from server.

logs of 503 error looks like below:
marathon_http_in marathon_http_in/ -1/-1/-1/-1/0 503 212 - - SC-- 22/22/0/0/0 0/0 "GET /api/item/


#2

If you did not find any TCP reset, what DID you find in the tcpdump?

SC is the network stack telling haproxy that the destination is unreachable. Could be due to “no route to host”, or unresolved ARP entries, etc.


#3

In fact, I got a sequence of NOSRV 503 error, here are just a few of them:
Jun 28 11:09:19 marathon_http_in/ -1/-1/-1/-1/0 503 212 - - SC-- 35/35/0/0/0 0/0 “GET /api/user/”
Jun 28 11:09:19 marathon_http_in marathon_http_in/ -1/-1/-1/-1/0 503 212 - - SC-- 35/35/0/0/0 0/0 “GET /api/user”

Static route is used, so route or arp is should ignored.

I only find TCP reset that was originated from haproxy, which is used for health check, syn sync&ack, rst&ack


#4

That’s not how it works. If you have static route, you still need to ARP the gateway. And you can always get a “no route to host” from the intermediate router or the end host.

You need to tcpdump everything, inlcluding ARP and ICMP from all IP’s. Then check it out. Or you run haproxy through strace -tt and point to the specific socket that had this problem.

But there will be no solution within haproxy. This is your kernel that is telling haproxy that the destination is unreachable.


#5

Sorry, I simplified my words.
By saying “route or arp should ignore”, I can make sure route and arp to gateway are ok.

I did not find from the source code how SC is parsed, so I am wondering if there are other logical conditions of haproxy that result in SC.


#6

strace -tt it, and you will see how your kernel tells haproxy how the destination is unreachable.


#7

thank you very much, I’ll try strace and report result if I made some progress.


#8

I used strace and find that haproxy first close socket and read socket as show beflow, this fd is 68
11:22:36.914496 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 68
11:22:36.914566 fcntl(68, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
11:22:36.914631 setsockopt(68, SOL_TCP, TCP_NODELAY, [1], 4) = 0
11:22:36.914699 setsockopt(68, SOL_TCP, TCP_QUICKACK, [0], 4) = 0
11:22:36.914763 connect(68, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr(“9.0.5.142”)}, 16) = -1 EINPROGRESS (Operation now in progress)
11:22:36.914897 epoll_wait(0, [], 200, 0) = 0
11:22:36.914962 connect(68, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr(“9.0.5.142”)}, 16) = -1 EALREADY (Operation already in progress)
11:22:36.915056 epoll_ctl(0, EPOLL_CTL_ADD, 68, {EPOLLOUT, {u32=68, u64=68}}) = 0
11:22:36.915125 epoll_wait(0, [{EPOLLOUT, {u32=68, u64=68}}], 200, 35) = 1
11:22:36.915198 connect(68, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr(“9.0.5.142”)}, 16) = 0
11:22:36.915289 recvfrom(68, NULL, 2147483647, MSG_TRUNC|MSG_DONTWAIT|MSG_NOSIGNAL, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
11:22:36.915356 setsockopt(68, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
11:22:36.915421 close(68) = 0
11:22:36.915515 epoll_wait(0, [{EPOLLIN, {u32=10, u64=10}}], 200, 34) = 1
11:22:36.937865 accept4(10, {sa_family=AF_INET, sin_port=htons(32807), sin_addr=inet_addr(“100.109.215.128”)}, [16], SOCK_NONBLOCK) = 68
11:22:36.938009 setsockopt(68, SOL_TCP, TCP_NODELAY, [1], 4) = 0
11:22:36.938104 accept4(10, 0x7ffed5fc07a0, 0x7ffed5fc079c, SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
11:22:36.938225 read(68, 0x55690f500b83, 5) = -1 ECONNRESET (Connection reset by peer)
11:22:36.938305 close(68) = 0


#9

Let’s ignore my last reply, because there is no sendto, so it is more likely a health check.
haproxy got an request, accept as fd 2, and sendto 503 without tcp-rst-like message.
It seems to be that haproxy reply 503 automatically by internal logic without proxy request to backend server.

11:22:36.765962 accept4(8, {sa_family=AF_INET, sin_port=htons(31516), sin_addr=inet_addr(“42.120.75.138”)}, [16], SOCK_NONBLOCK) = 2
11:22:36.766073 setsockopt(2, SOL_TCP, TCP_NODELAY, [1], 4) = 0
11:22:36.766140 accept4(8, 0x7ffed5fc07a0, 0x7ffed5fc079c, SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
11:22:36.766227 recvfrom(2, 0x55690f3efe04, 15360, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
11:22:36.766295 epoll_ctl(0, EPOLL_CTL_ADD, 2, {EPOLLIN|EPOLLRDHUP, {u32=2, u64=2}}) = 0
11:22:36.766369 epoll_wait(0, [{EPOLLIN, {u32=76, u64=76}}], 200, 62) = 1
11:22:36.766460 recvfrom(76, “GET /agent/0d908a29-bf91-4aa7-97”…, 15360, 0, NULL, NULL) = 842
11:22:36.766606 getpid() = 22479
11:22:36.766663 sendmsg(5, {msg_name(110)={sa_family=AF_LOCAL, sun_path="/dev/log"}, msg_iov(8)=[{"<135>Jun 29 03:22:36 “, 21}, {“haproxy”, 7}, {”[", 1}, {“22479”, 5}, {"]: “, 3}, {”", 0}, {“add log-id header:01153024255676”…, 56}, {"\n", 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 94
11:22:36.766806 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 68
11:22:36.766884 fcntl(68, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
11:22:36.766946 setsockopt(68, SOL_TCP, TCP_NODELAY, [1], 4) = 0
11:22:36.767005 connect(68, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr(“9.0.5.130”)}, 16) = -1 EINPROGRESS (Operation now in progress)
11:22:36.767125 epoll_wait(0, [{EPOLLIN, {u32=2, u64=2}}], 200, 0) = 1
11:22:36.767249 sendto(68, “GET /agent/0d908a29-bf91-4aa7-97”…, 951, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 951
11:22:36.767339 recvfrom(2, “GET / HTTP/1.1\r\nHost: yanan-stor”…, 15360, 0, NULL, NULL) = 464
11:22:36.767490 getpid() = 22479
11:22:36.767549 sendmsg(5, {msg_name(110)={sa_family=AF_LOCAL, sun_path="/dev/log"}, msg_iov(8)=[{"<135>Jun 29 03:22:36 “, 21}, {“haproxy”, 7}, {”[", 1}, {“22479”, 5}, {"]: “, 3}, {”", 0}, {“add log-id header:01153024255676”…, 56}, {"\n", 1}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 94
11:22:36.767698 epoll_ctl(0, EPOLL_CTL_DEL, 2, 0x55690ed5f430) = 0
11:22:36.767738 epoll_wait(0, [], 200, 0) = 0
11:22:36.767764 recvfrom(68, 0x55690f3ebde4, 15360, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
11:22:36.767805 sendto(2, “HTTP/1.0 503 Service Unavailable”…, 212, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 212
11:22:36.767836 shutdown(2, SHUT_WR) = 0


#10

This problem is solved. Except reset from backend server, if we dispatch request from host of http header,and find no frontend, 503 with NOSRV and SC can also occur.
Thank for you help, @lukastribus.