Hi, newbe here.
I have inherited a set up using just http/s which has been working OK for a long time. I have been asked to add load balancing between Nxlog-ce and our Nagios Log Server cluster which uses tcp rather than http. In the nxlog logs on the server I am often seeing:
2023-11-16 03:22:36 INFO connecting to nlscluster.example.com:2063
2023-11-16 03:22:36 INFO connecting to nlscluster.example.com:2063
2023-11-16 03:22:36 INFO connecting to nlscluster.example.com:2062
2023-11-16 03:42:35 INFO reconnecting in 1 seconds
2023-11-16 03:42:35 ERROR om_tcp detected a connection error;End of file found
2023-11-16 03:42:36 INFO connecting to nlscluster.example.com:2060
2023-11-16 04:22:36 INFO reconnecting in 1 seconds
2023-11-16 04:22:36 INFO reconnecting in 1 seconds
2023-11-16 04:22:36 INFO reconnecting in 1 seconds
2023-11-16 04:22:36 ERROR om_tcp detected a connect2023-11-16 04:22:36 ERROR om_tcp detected a connection error;End of file found
2023-11-16 04:22:36 ERROR om_tcp detected a connection error;End of file found
2023-11-16 04:22:36 ERROR om_tcp detected a connection error;End of file found
So, it looks like Haproxy is unexpectedly closing the connections. The global and tcp configs are:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
tune.ssl.default-dh-param 2048
defaults
log global
mode http
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
listen statspage
bind *:81
stats enable
stats uri /report
stats refresh 20s
stats auth my:secret
stats admin if TRUE
#---------------------------------------------------------------------
# Nagios Log Server frontend which forwards to the NLS Cluster servers
#---------------------------------------------------------------------
frontend Example_NLS
bind :2056
bind :2057
bind :2058
bind :2059
bind :2060
bind :2061
bind :2062
bind :2063
bind :2064
bind :2065
bind :2066
bind :2067
bind :2068
bind :2069
bind :2070
bind :2071
bind :3515
bind :5544
mode tcp
option tcplog
option clitcpka
timeout client 1h
# No acl needed as all traffic on these ports are for NLS
default_backend NLScluster
backend NLScluster
description hpx01 Nagios Log Server Monitoring Backend
mode tcp
option srvtcpka
timeout server 1h
balance roundrobin
server SCNLS1 sc1psnls01.example.com check port 80
server SCNLS2 sc1psnls02.example.com check port 80
server SCNLS3 sc1psnls03.example.com check port 80
Things like the timeouts were set in the global section for when we had just http. I don’t know if they could be interfering but we do appear to be overriding them anyway in the tcp section.
The nxlog errors look like it is Haproxy closing the connections.
What can I do to stop Haproxy closing the nxlog connections?