Keep Alive Behaviour Change Since 1.7.6


#1

Hi,

We’ve noticed a strange change in keep alive behaviour since version 1.7.6 - Connections are being held open until timeout client is met.

The request is served to the client in the normal time, but nothing is logged until the timeout is met. The logs then report the total session duration time as just over the timeout client value and if another request is made by the browser within the timeout the connection is reused.

We’d normally expect the request to be logged straight after completion, the timings reflect the request duration seen from the browser and for the connection only to be reusable within our timeout http-keep-alive which is set much lower than our timeout client.

This is not happening on our backends with option forceclose (we need to do that for certain user agents).

Confirmed it is timeout client driving this behaviour by setting to an usual value e.g. -

timeout client 43s in the config
Request is served to the browser in 615ms
Request timings in log - "Th":155,"Ti":-1,"TR":149,"tq":303,"Tw":0,"Tc":1,"Tr":34,"Ta":43339,"Tt":43339 (We JSON format them)

Has anyone else noticed similar? When we tried to go live on 1.7.6 and later 1.7.7 we saw a massive performance hit because we exhausted our connections (maxconn).

Snippets from our config -

defaults
log global
mode http
balance roundrobin
option abortonclose
option http-server-close
timeout connect 9s
timeout client 60s
timeout server 60s
timeout check 5s
timeout http-keep-alive 5s

backend appserver-backend
http-check send-state
balance roundrobin
cookie JSESSIONID prefix
option httpchk HEAD /monitors/check
option persist
option redispatch
option http-pretend-keepalive


#2

What was the release that worked previously?


#3

We’re running 1.7.5 in production right now without issue.


#4

I’m unable to reproduce this in both 1.7.6 and 1.7.7.

Syslog is send right away and the connection is closed after 5 seconds - not 43 seconds.

Can you share more details:

  • output of haproxy -vv (of the affected binary)
  • a more complete configuration
  • the full syslog message
  • if possible a tcpdump

#5

Hi Lukas - thanks for your efforts so far.

While trimming our config to send the bare minimum I’ve been able to determine that the behaviour is only exhibited when the compression algo parameter is set.

I’ve therefore been able to create a very basic config to reproduce this issue.

Here’s the details you requested -

root@vagrant-ubuntu-trusty-64:~# haproxy -vv
HA-Proxy version 1.7.7-1ppa1~trusty 2017/06/27
Copyright 2000-2017 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_NS=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.31 2012-07-06
Running on PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with network namespace support

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[COMP] compression
[TRACE] trace
[SPOE] spoe

Configuration based on default provided by PPA package -

global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon

Default SSL material locations

ca-base /etc/ssl/certs
crt-base /etc/ssl/private

Default ciphers to use on SSL-enabled listening sockets.

For more information, see ciphers(1SSL). This list is from:

https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/

An alternative list with additional directives can be obtained from

https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy

ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3

defaults
log global
mode http
option httplog
option dontlognull
option http-server-close # added
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http

frontend test-frontend
bind *:9090
default_backend test-backend

backend test-backend
compression algo gzip
server server1 127.0.0.1:80

Syslog example -

Jun 30 11:10:49 vagrant-ubuntu-trusty-64 haproxy[2930]: 10.0.2.2:65529 [30/Jun/2017:11:09:59.041] test-frontend test-backend/server1 0/0/0/0/50003 304 143 - - ---- 1/1/0/0/0 0/0 “GET /icons/ubuntu-logo.png HTTP/1.1”

Additional notes -

When option http-server-close is set, the connection is held until the timeout client is met and the log timings reflect that.
When not set, the connection is still held until what I assume is the backend server keep alive timeout and log timings again reflect that. E.g. from a default apache backend I get this -

Jun 30 11:17:03 vagrant-ubuntu-trusty-64 haproxy[3089]: 10.0.2.2:50442 [30/Jun/2017:11:16:58.086] test-frontend test-backend/server1 0/0/0/0/4949 304 124 - - ---- 1/1/0/0/0 0/0 “GET /icons/ubuntu-logo.png HTTP/1.1”

Another interesting thing to note is that the log events are delayed only when in http format (or custom with relevant timings) - I guess that is expected behaviour.

I hope all that makes sense and helps. Give me a shout if you need any additional details.

Thanks.


#6

Ok, I can reproduce this now, thanks.

I will trace this to the offending change and report it to the mailing list and the author of the offending change; usually we also add the real name of the original reporter to the bug report (and later to the fix), especially in cases like this were a lot of details where provided. Tell me your real name if you would like that, otherwise we will just have to call you mattmonkey :slight_smile:


#7

Thanks Lukas. Much appreciated and I’ve added my name to my profile :sunglasses:


#8

It was more complicated, actually 1.7.5 is buggy is well, it just doesn’t hurt your configuration that much. 1.7.6 then makes it worse.

I reported it on the mailing list, you can follow the thread on the archive if you are not subscribed (but once there is a fix I will update this thread as well):
https://www.mail-archive.com/haproxy@formilux.org/msg26621.html


#9

Fantastic analysis Lukas - I hadn’t even noticed an issue with 1.7.5 but I’ve checked our historical metrics and can see a massive difference in feconn when we switched from 1.5 to 1.7.

Really appreciate all your help :thumbsup:


#10

This issue is fixed by 3 patches that have been committed to 1.8, they will soon be backported to the 1.7:
https://www.mail-archive.com/haproxy@formilux.org/msg26694.html

edit:
Haproxy 1.7.8 was just released with the fix for this issue:
https://www.mail-archive.com/haproxy@formilux.org/msg26701.html