Haproxy closing connections unexpectedly

NickH · November 16, 2023, 9:45am

Hi, newbe here.
I have inherited a set up using just http/s which has been working OK for a long time. I have been asked to add load balancing between Nxlog-ce and our Nagios Log Server cluster which uses tcp rather than http. In the nxlog logs on the server I am often seeing:

2023-11-16 03:22:36 INFO connecting to nlscluster.example.com:2063
2023-11-16 03:22:36 INFO connecting to nlscluster.example.com:2063
2023-11-16 03:22:36 INFO connecting to nlscluster.example.com:2062
2023-11-16 03:42:35 INFO reconnecting in 1 seconds
2023-11-16 03:42:35 ERROR om_tcp detected a connection error;End of file found
2023-11-16 03:42:36 INFO connecting to nlscluster.example.com:2060
2023-11-16 04:22:36 INFO reconnecting in 1 seconds
2023-11-16 04:22:36 INFO reconnecting in 1 seconds
2023-11-16 04:22:36 INFO reconnecting in 1 seconds
2023-11-16 04:22:36 ERROR om_tcp detected a connect2023-11-16 04:22:36 ERROR om_tcp detected a connection error;End of file found
2023-11-16 04:22:36 ERROR om_tcp detected a connection error;End of file found
2023-11-16 04:22:36 ERROR om_tcp detected a connection error;End of file found

So, it looks like Haproxy is unexpectedly closing the connections. The global and tcp configs are:

global
        log /dev/log    local0
        log /dev/log    local1 notice
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
        ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
        ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
        tune.ssl.default-dh-param 2048

defaults
        log     global
        mode    http
        option  dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

listen statspage
    bind *:81
    stats enable
    stats uri /report
    stats refresh 20s
    stats auth my:secret
    stats admin if TRUE

#---------------------------------------------------------------------
# Nagios Log Server frontend which forwards to the NLS Cluster servers
#---------------------------------------------------------------------
frontend Example_NLS
    bind :2056
    bind :2057
    bind :2058
    bind :2059
    bind :2060
    bind :2061
    bind :2062
    bind :2063
    bind :2064
    bind :2065
    bind :2066
    bind :2067
    bind :2068
    bind :2069
    bind :2070
    bind :2071
    bind :3515
    bind :5544
    mode               tcp
    option             tcplog
    option             clitcpka
    timeout client 1h
    # No acl needed as all traffic on these ports are for NLS
    default_backend    NLScluster

backend NLScluster
    description         hpx01 Nagios Log Server Monitoring Backend
    mode                tcp
    option              srvtcpka
    timeout server      1h
    balance             roundrobin
    server    SCNLS1    sc1psnls01.example.com check port 80
    server    SCNLS2    sc1psnls02.example.com check port 80
    server    SCNLS3    sc1psnls03.example.com check port 80

Things like the timeouts were set in the global section for when we had just http. I don’t know if they could be interfering but we do appear to be overriding them anyway in the tcp section.

The nxlog errors look like it is Haproxy closing the connections.

What can I do to stop Haproxy closing the nxlog connections?

NickH · November 16, 2023, 10:54am

Hmm. It looks like the connections are dropping roughly at hourly intervals. This corresponds to the timeout client/server values. Is it possible and safe to set them to unlimited? Or should I just let nxlog renegotiate hourly?

If I can set the timeout to 0/infinite, should I do any other settings in case the nxlog client disappears?

It is like the srvtcpka clitcpka are having no effect in persisting the tcp connections.

Topic		Replies	Views
Haproxy Timeout Error with nginx Help!	0	595	March 20, 2022
Haproxy : tcp-request connection Close the connexion in 60 seconds Help!	7	2908	February 10, 2022
Haproxy HA to Redis , server closed the connection Help!	2	2445	November 16, 2023
This channel can no longer be used to send message output session was auto-closed due to a server-initiated shutdown Help!	6	3551	April 6, 2018
Haproxy loses connections Help!	1	3849	February 1, 2018

Haproxy closing connections unexpectedly

Related topics