HAProxy Current Sessions Climbing

I have a site which I recently switched to SSL. It has run for years without a problem. But when I separated http and https frontends, and configured the http frontend to only redirect to https, the http current sessions climbs endlessly until max connections are hit. At this point, the https frontend is still fine, but http is unreachable.

I’ve tried tweaking timeouts to see if I can get the sessions to terminate but nothing seems to help. I’m currently running HAProxy version 1.7.9 on CentOS Linux 7.1

Here’s my configuration:

global
log localhost local4
log-send-hostname
log-tag haproxy

chroot      /var/lib/haproxy
pidfile     /var/run/haproxy/www-site.pid
maxconn     6000
user        haproxy
group       haproxy
tune.ssl.default-dh-param 2048
daemon

stats socket /var/lib/haproxy/www-site.stats

defaults
mode http
log global
option httplog
option dontlognull
option redispatch
retries 3
timeout http-request 3s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
timeout http-keep-alive 3s
maxconn 5000

frontend stats
bind 10.1.1.135:1936
stats enable
stats uri /
stats auth operations:sdfasdfasdf
stats admin if TRUE

frontend site-www-http
redirect scheme https
bind 1.2.3.4:80

frontend site-www
bind 1.2.3.4:443 ssl crt /etc/pki/tls/private/www.site.com.pem ciphers …
bind 10.1.1.135:80

default_backend www

backend www
mode http
balance roundrobin
option httpchk GET / HTTP/1.1\r\nHost:\ www.site.com
server www01 www-01.prod.app:80 check
server www02 www-02.prod.app:80 check

Sounds like a bug; this may be fixed be patches we have in the 1.7 post 1.7.9 though (like adc3cf3 (“BUG/MEDIUM: http: Close streams for connections closed before a redirect”).

Can you try the 1.7 snapshot from December, 4th:
http://www.haproxy.org/download/1.7/src/snapshot/haproxy-ss-20171204.tar.gz

Thank you for that information.

I’m a little concerned about pushing a development snapshot into a Production environment, and it might be hard to reproduce this load in a QA environment. Is there any chance this is fixed in 1.8?

I should have checked first. It looks like it was fixed in 1.8.1. I will test that out.

Thank You

Running 1.8.1 stable is actually more risky than running that 1.7 snapshot.

The 1.7 snapshot is based on the stable 1.7 tree with bugfixes that will be in the upcoming 1.7.10 release. There is nothing wrong from a risk perspective, it’s just not officially tagged as a release.

1.8 on the other hand is a new major release that will certainly introduce new bugs, and that includes 1.8.1.

If you want lower risk, use the snapshot.

That makes sense. I’ll give the snapshot a try.

That snapshot fixed the problem. Running for days now under heavy traffic, open sessions stable below 20.

Thank You!

Great, thanks for reporting back. FYI 1.7.10 was released in the meantime and contains all the fixes your snapshot contains and more, so you may want to upgrade to that.