Doubling tcp resets

Hi,

While researching an issue, I’ve found out doubling tcp resets:

# tcpdump -nn "host 172.30.2.194 and port 3443" | grep "[R]"
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
16:53:27.454475 IP 172.30.2.194.3443 > 172.30.1.30.10434: Flags [R], seq 4150895860, win 0, length 0
16:53:27.454628 IP 172.30.2.194.3443 > 172.30.1.30.10434: Flags [R], seq 4150895860, win 0, length 0
16:53:28.540711 IP 172.30.2.194.3443 > 172.30.1.30.10578: Flags [R], seq 4211144735, win 0, length 0
16:53:28.540730 IP 172.30.2.194.3443 > 172.30.1.30.10578: Flags [R], seq 4211144735, win 0, length 0
16:53:30.741290 IP 172.30.2.194.3443 > 172.30.2.55.34986: Flags [R], seq 4066360692, win 0, length 0
16:53:30.741376 IP 172.30.2.194.3443 > 172.30.2.55.34986: Flags [R], seq 4066360692, win 0, length 0
16:53:36.051007 IP 172.30.2.194.3443 > 172.30.1.30.8370: Flags [R], seq 3608015152, win 0, length 0
16:53:36.051020 IP 172.30.2.194.3443 > 172.30.1.30.8370: Flags [R], seq 3608015152, win 0, length 0
16:53:45.898741 IP 172.30.2.194.3443 > 172.30.2.55.35228: Flags [R], seq 3409875850, win 0, length 0
16:53:45.898780 IP 172.30.2.194.3443 > 172.30.2.55.35228: Flags [R], seq 3409875850, win 0, length 0
16:53:46.603130 IP 172.30.2.194.3443 > 172.30.2.55.34398: Flags [R], seq 1417708259, win 0, length 0
16:53:46.603242 IP 172.30.2.194.3443 > 172.30.2.55.34398: Flags [R], seq 1417708259, win 0, length 0

Can anyone explain them?

PS HAProxy v.1.9.5 running under Amazon Linux on AWS EC2 instance
PPS There’re two EC2 instances and both of them send tcp reset twice.
PPS HAProxy serves on 172.30.2.194:3443

Alright… Here is another test:

frontend lb-useast
  mode http
  bind *:4080 name lb-useast_frontend_http_new
  bind *:4443 name lb-useast_frontend_new ssl crt ....
  monitor-uri /haproxy

Curling monitoring URI using https:
curl -v http://proxy:4443/haproxy

16:44:36.777010 IP 172.30.2.194.55212 > 172.30.1.195.4443: Flags [S], seq 3893092203, win 26883, options [mss 8961,sackOK,TS val 3016125645 ecr 0,nop,wscale 9], length 0
16:44:36.777030 IP 172.30.1.195.4443 > 172.30.2.194.55212: Flags [S.], seq 785191015, ack 3893092204, win 26847, options [mss 8961,sackOK,TS val 3988408654 ecr 3016125645,nop,wscale 9], length 0
16:44:36.777536 IP 172.30.2.194.55212 > 172.30.1.195.4443: Flags [.], ack 1, win 53, options [nop,nop,TS val 3016125645 ecr 3988408654], length 0
16:44:36.786258 IP 172.30.2.194.55212 > 172.30.1.195.4443: Flags [P.], seq 1:518, ack 1, win 53, options [nop,nop,TS val 3016125654 ecr 3988408654], length 517
16:44:36.787649 IP 172.30.1.195.4443 > 172.30.2.194.55212: Flags [P.], seq 1:4114, ack 518, win 55, options [nop,nop,TS val 3988408665 ecr 3016125654], length 4113
16:44:36.788232 IP 172.30.2.194.55212 > 172.30.1.195.4443: Flags [.], ack 4114, win 69, options [nop,nop,TS val 3016125656 ecr 3988408665], length 0
16:44:36.789932 IP 172.30.2.194.55212 > 172.30.1.195.4443: Flags [P.], seq 518:644, ack 4114, win 69, options [nop,nop,TS val 3016125658 ecr 3988408665], length 126
16:44:36.790206 IP 172.30.1.195.4443 > 172.30.2.194.55212: Flags [P.], seq 4114:4165, ack 644, win 55, options [nop,nop,TS val 3988408667 ecr 3016125658], length 51
16:44:36.791471 IP 172.30.2.194.55212 > 172.30.1.195.4443: Flags [P.], seq 644:785, ack 4165, win 69, options [nop,nop,TS val 3016125659 ecr 3988408667], length 141
16:44:36.791872 IP 172.30.1.195.4443 > 172.30.2.194.55212: Flags [P.], seq 4165:4340, ack 785, win 57, options [nop,nop,TS val 3988408669 ecr 3016125659], length 175
16:44:36.791960 IP 172.30.1.195.4443 > 172.30.2.194.55212: Flags [P.], seq 4340:4371, ack 785, win 57, options [nop,nop,TS val 3988408669 ecr 3016125659], length 31
16:44:36.792059 IP 172.30.1.195.4443 > 172.30.2.194.55212: Flags [F.], seq 4371, ack 785, win 57, options [nop,nop,TS val 3988408669 ecr 3016125659], length 0
16:44:36.792912 IP 172.30.2.194.55212 > 172.30.1.195.4443: Flags [.], ack 4372, win 85, options [nop,nop,TS val 3016125661 ecr 3988408669], length 0
16:44:36.793124 IP 172.30.2.194.55212 > 172.30.1.195.4443: Flags [P.], seq 785:816, ack 4372, win 85, options [nop,nop,TS val 3016125661 ecr 3988408669], length 31
16:44:36.793131 IP 172.30.1.195.4443 > 172.30.2.194.55212: Flags [R], seq 785195387, win 0, length 0

Notice the last TCP RST.

Now, curling same but via http:
curl -v http://proxy:4080/haproxy

16:46:43.407168 IP 172.30.2.194.44032 > 172.30.1.195.4080: Flags [S], seq 2523157193, win 26883, options [mss 8961,sackOK,TS val 3016252276 ecr 0,nop,wscale 9], length 0
16:46:43.407187 IP 172.30.1.195.4080 > 172.30.2.194.44032: Flags [S.], seq 1529397891, ack 2523157194, win 26847, options [mss 8961,sackOK,TS val 3988535286 ecr 3016252276,nop,wscale 9], length 0
16:46:43.407669 IP 172.30.2.194.44032 > 172.30.1.195.4080: Flags [.], ack 1, win 53, options [nop,nop,TS val 3016252276 ecr 3988535286], length 0
16:46:43.408830 IP 172.30.2.194.44032 > 172.30.1.195.4080: Flags [P.], seq 1:113, ack 1, win 53, options [nop,nop,TS val 3016252278 ecr 3988535286], length 112
16:46:43.408941 IP 172.30.1.195.4080 > 172.30.2.194.44032: Flags [F.], seq 1:147, ack 113, win 53, options [nop,nop,TS val 3988535288 ecr 3016252278], length 146
16:46:43.410026 IP 172.30.2.194.44032 > 172.30.1.195.4080: Flags [F.], seq 113, ack 148, win 55, options [nop,nop,TS val 3016252279 ecr 3988535288], length 0
16:46:43.410033 IP 172.30.1.195.4080 > 172.30.2.194.44032: Flags [.], ack 114, win 53, options [nop,nop,TS val 3988535289 ecr 3016252279], length 0

No RST at the end. Why tcp session termination differ? Is that OK?

Teardown of HTTPS is completely different due to TLS shutdown (sending close_notify), etc.

Is there a real, actual problem here?

Well… There is an AWS NLB which distributes traffic between two targets. There is a metric of NLB which is called “Target reset count” which counts tcp resets received from a target. I was going to use this metric to check health of the NLB: when something goes wrong targets might refuse requests and I’ll get an alert.

So, NLB’s TCP listener port 80 forwards traffic to one target (port 4080) and then to haproxy:4080, and port 443 forwards to another target (port 4443) and then to haproxy:4443.
The healthchecker of the 1st target queries HTTP port 4080 and retrieves /haproxy URI - this one has no issues.

The other healthchecker was set up to query HTTPS port 4443 and here the problem comes up: target reset count becomes non-zero. I’m sure it’s the healthchecker issue, b/c no customers’ traffic is routed to the NLB yet and tcpdump shows multiple tcp resets to NLB queries.

It’s worth to mention, that if I use HTTP healthchecker instead of HTTPS, the issue goes away and tcp reset count becomes zero.

Haproxy will always do what is most efficient, in many cases that means closing the TCP connection with a reset, as opposed to FIN close-down.

Monitoring for TCP resets and considering that a problem is wrong, in my opinion.

1 Like

I’m still wondering why haproxy sends tcp resets twice:

[root@use1-proxy-02 haproxy]# tcpdump -nn 'tcp[tcpflags] & (tcp-rst) !=0' and '(port 4080 or port 4443)' and src host 172.30.2.194
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
16:52:37.122241 IP 172.30.2.194.4443 > 54.158.x.y.53258: Flags [R], seq 1111895933, win 0, length 0
16:52:37.122255 IP 172.30.2.194.4443 > 54.158.x.y.53258: Flags [R], seq 1111895933, win 0, length 0
16:54:37.127493 IP 172.30.2.194.4443 > 54.158.x.y.53480: Flags [R], seq 1651591007, win 0, length 0
16:54:37.127505 IP 172.30.2.194.4443 > 54.158.x.y.53480: Flags [R], seq 1651591007, win 0, length 0
16:54:41.502721 IP 172.30.2.194.4443 > 34.203.a.b.30440: Flags [R], seq 1750702288, win 0, length 0
16:54:41.502739 IP 172.30.2.194.4443 > 34.203.a.b.30440: Flags [R], seq 1750702288, win 0, length 0
16:54:41.502937 IP 172.30.2.194.4443 > 34.203.a.b.30438: Flags [R], seq 3148330375, win 0, length 0
16:54:41.502960 IP 172.30.2.194.4443 > 34.203.a.b.30438: Flags [R], seq 3148330375, win 0, length 0
16:54:41.503387 IP 172.30.2.194.4443 > 34.203.a.b.30432: Flags [R], seq 1466597445, win 0, length 0
16:54:41.503407 IP 172.30.2.194.4443 > 34.203.a.b.30432: Flags [R], seq 1466597445, win 0, length 0

Please, help me to figure that out.

PS

[root@use1-proxy-02 haproxy]# haproxy -vv
HA-Proxy version 1.9.5 2019/03/19 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits -DTCP_USER_TIMEOUT=18
  OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.3
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 8.21 2011-12-12
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE
              h2 : mode=HTTP       side=FE
       <default> : mode=HTX        side=FE|BE
       <default> : mode=TCP|HTTP   side=FE|BE

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace

[root@use1-proxy-02 haproxy]# netstat -lnp|grep 4443
tcp        0      0 0.0.0.0:4443                0.0.0.0:*                   LISTEN      14485/haproxy

Haproxy doesn’t send TCP resets, the TCP stack of your kernel does. Haproxy just makes calls to the socket API.

If you want to know more about TCP Reset or it’s handling in the Linux network stack, I’m afraid you will have to look elsewhere, as I don’t the low-level details about it and I’m unable to research this for you.

Oh… I thought haproxy does that low-level things. Nevermind then.

I’m sorry for pumping this up, but this is really strange. I took completely different host (an LXC container actually) running CentOS 6 and started latest haproxy:

# ./haproxy -vv
HA-Proxy version 2.0.3 2019/07/23 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O0 -g -fno-strict-aliasing
  OPTIONS = USE_PCRE=1 USE_THREAD=1 USE_REGPARM=1 USE_LINUX_TPROXY=1 USE_OPENSSL=1 USE_ZLIB=1 USE_TFO=1 USE_NS=

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE -PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL -LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO -NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL -SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.3
Running on zlib version : 1.2.3
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 7.8 2008-09-05
Running on PCRE version : 7.8 2008-09-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Encrypted password support via crypt(3): yes

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services : none

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace

# ./haproxy -d -f haproxy.cfg -p ./haproxy.pid
[WARNING] 217/110641 (5885) : parsing [haproxy.cfg:45]: 'log-format' overrides previous 'option httplog' in 'defaults' section.
[WARNING] 217/110641 (5885) : parsing [haproxy.cfg:77] : The 'reqdel' directive is deprecated in favor of 'http-request del-header' and will be removed in next version.
Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result FAILED
Total: 3 (2 usable), will use epoll.

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace
Using epoll() as the polling mechanism.
[WARNING] 217/110641 (5885) : [./haproxy.main()] Cannot raise FD limit to 62527, limit is 30000.
<133>Aug  6 11:06:41 haproxy[5885]: Proxy lb-useast started.
<133>Aug  6 11:06:41 haproxy[5885]: Proxy ags_backend started.
[WARNING] 217/110641 (5885) : [./haproxy.main()] FD limit (30000) too low for maxconn=25000/maxsock=62527. Please raise 'ulimit-n' to 62527 or more to avoid any trouble.
[NOTICE] 217/110641 (5885) : New worker #1 (5886) forked

The config is:

# cat haproxy.cfg
global
    log stdout len 2048 local0
    pidfile     /var/run/haproxy.pid
    maxconn     25
    user        nobody
    group       nobody
    daemon
    master-worker       #

defaults
  mode http
  log global
  option dontlognull            # don't log sessions even if no data exchange happened
  option splice-auto            # accelerate performance with kernel tcp splicing options
  option httplog                # enable logging of HTTP request, session state and timers
  option http-server-close      # operate in http-close mode
  option redispatch             # allow switching to another backend server when the one in the cookie gets down
  option contstats              # enable continuous traffic statistics updates
  no option http-use-htx        # freeswitch wss does not work when this option is enabled (default)
  retries 3
  backlog 25
  timeout client          60s   # was 120
  timeout client-fin      15s   # was 25
  timeout connect          5s
  timeout server           1h   # was 60s
  timeout tunnel           1h
  timeout http-keep-alive 10s   # was 1
  timeout http-request     5s   # was 15
  timeout queue           30s
  timeout tarpit          60s
  timeout check            5s
  default-server inter 6s rise 1 fall 3

frontend lb-useast
  mode http
  maxconn 30
  bind *:4080 name lb-useast_frontend_http_new
  default_backend ags_backend

backend ags_backend
  option httpchk GET /switchboard/WICModule.nocache.js
  http-check expect string WICModule
  balance roundrobin
  cookie agscookie insert nocache indirect httponly secure
  server sp-useast-001 sp-useast-001.dom.com:80 cookie sp-useast-001 weight 10 check port 80

And sniffed for the http-check traffic (https://www.dropbox.com/s/84p4m6mbehrblqx/dump.zip?dl=0). First serveral iterations haproxy retrieves the httpchk page without issues, but then it’s starting to send RSTs in the middle of transaction.


I tried to raise tune.chksize buffer but that didn’t help.

Do you have an explanation of that?

Timing differences, I’d assume.

Like I said, I don’t know the details of low level TCP behavior.