HAProxy community

Server connection issue after upgrading from 1.9.8 to 2.0.1

I am using HAProxy (Community Edition) for a Redis Server access (High Availability for a master-slave Redis setup).
It runs on a Debian Stretch (9.9) LXC container installed from the apt repository haproxy.debian.net.

After upgrading from HAProxy 1.9.8-1~bpo9+1 to 2.0.1-1~bpo9+1, I noticed a server connection issue: Server connections on Redis grew up from about 50 connections to 4000, making Redis Server unavailable under peak periods.

After downgrading to 1.9.8-1~bpo9+1, everything was back to its original state (server connections came back to around 50).

Usually, the number of client connections is around 50, but can reach 500 under certain circumstances.

Is there a change between 1.9.x and 2.0.x that could explain this problem?

Here is my HAProxy configuration file (same conf used for v1.9.8 and 2.0.1):

global
        log /dev/log local0
        log /dev/log local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon
        maxconn 32000

defaults
        log global
        mode http
        option httplog
        option dontlognull
        option dontlog-normal
        timeout connect 5000
        timeout client  50000
        timeout server  50000

frontend redis_master
    bind 10.2.1.100:6379,10.2.1.101:6379
    option tcplog
    mode tcp
    timeout client 10s
    default_backend redis_master

backend redis_master
    mode tcp
    balance first
    timeout connect 4s
    timeout server 10s
    option tcp-check
    default-server rise 1 on-marked-down shutdown-sessions
    server 10.2.1.3 10.2.1.3:6379 check #disabled
    server 10.2.1.4 10.2.1.4:6379 check disabled

Do you see a lot of connections in CLOSE_WAIT with 2.0.1 in the netstat/ss output?

I can’t upgrade to 2.0.1 for now as it is a production server, but I will try to make this test tomorrow.

Actually, in the current 1.9.8, I see very few CLOSE_WAIT in netstat (just 1 or 2). Essentially many TIME_WAIT (~25k)

I reproduced the problem on a test environment, and confirms that there are many CLOSE_WAIT in netstat after switching to HAProxy 2.0.1.

Details of my bench:

server1 (10.0.0.1) Apache 2.4.25 + PHP 7.0.33 + lib PHPRedis 4.0.2
server2 (10.0.0.2) HAProxy 1.9.8 / 2.0.1
server3 (10.0.0.3) Redis 4.0.11

PHP script on server1:

<?php
$redis = new Redis(); 
$redis->connect('10.0.0.2', 6379); 
$redis->select(4);
$v = $redis->get('mykey');
var_dump($v);
$redis->close();

Test with Apache Bench from server1:

ab -c 5 -t 30 http://localhost/redis.php

With HAPROXY 1.9.8

  • Requests per second: 2972.58 [#/sec]
  • Number of client connections on Redis server: 8
  • HAProxy errors: 0

With HAPROXY 2.0.1

  • Requests per second: 467.95 [#/sec]
  • Number of client connections on Redis server: 4000+
  • HAProxy errors: 8736 (see below)

HAProxy 2.0.1 errors during ab test (HAProxy 2.0.1):

Jun 28 12:07:34 server2 haproxy[32008]: 10.0.0.2:39750 [28/Jun/2019:12:07:24.247] redis_test redis_test/10.0.0.3 1/0/10001 23 sD 4/2/1/1/0 0/0
Jun 28 12:07:34 server2 haproxy[32008]: 10.0.0.2:39794 [28/Jun/2019:12:07:24.279] redis_test redis_test/10.0.0.3 1/0/10004 23 sD 3/1/0/0/0 0/0

netstat during ab test (HAProxy 2.0.1):

root@server2:~# netstat -laputen | grep 6379 | grep CLOSE_WAIT
tcp        0      0 10.0.0.2:6379       10.0.0.1:34424      CLOSE_WAIT  113        2264079514 32008/haproxy       
tcp        0      0 10.0.0.2:6379       10.0.0.1:33706      CLOSE_WAIT  113        2264071597 32008/haproxy       
...

root@server2:~# netstat -laputen | grep 6379 | grep CLOSE_WAIT | wc -l
6093

Thanks.

Investigation is ongoing in the following github issue:

I’m able to reproduce your issue. The commit fe4abe6 introduced the bug. I will discuss today with @cognet to see how we could fix the issue.

Glad you were able to reproduce, thank you guys. Freezing to 1.9.8 until it’s fixed.

The bug is now fixed. See commit 6c7e96a3e for details. The commit was backported to 2.0 and 1.9.