global
log /dev/log local0
log /dev/log local1 notice
user root
group root
daemon
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA
ssl-default-bind-options no-sslv3
ssl-default-server-ciphers kEECDH+aRSA+AES:kRSA+AES:+AES256:RC4-SHA:!kEDH:!LOW:!EXP:!MD5:!aNULL:!eNULL
ssl-default-server-options no-sslv3
stats socket /run/haproxy.sock mode 660 level admin
defaults
log global
option dontlognull
option redispatch
option tcp-smart-accept
option tcp-smart-connect
timeout connect 5s
timeout client 480m
timeout server 480m
timeout http-keep-alive 1s
timeout http-request 15s
timeout queue 30s
timeout tarpit 1m
frontend mysql
bind <IP>:3306
mode tcp
option tcplog
default_backend mysql_nodes
backend mysql_nodes
mode tcp
balance leastconn
option tcp-check
option httpchk
server mysql-1 <IP1>:3306 backup check port 9200 maxconn 1500 inter 1s fall 5 rise 2
server mysql-2 <IP2>:3306 check port 9200 maxconn 1500 inter 1s fall 5 rise 2
server mysql-3 <IP3>:3306 check port 9200 maxconn 1500 inter 1s fall 5 rise 2
I have set up health check on port 9200 with Xinetd and scripts.
I could see in the log that the layer4 checks are failing and layer7 checks passing fine.
Server mysql_nodes/mysql-2 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 133ms. 1 active and 1 backup servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
Server mysql_nodes/mysql-2 is DOWN, reason: Layer4 connection problem, info: "Connection refused", check duration: 133ms. 1 active and 1 backup servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
Server mysql_nodes/mysql-2 is UP, reason: Layer7 check passed, code: 200, check duration: 457ms. 2 active and 1 backup servers online. 0 sessions requeued, 0 total in queue.
Server mysql_nodes/mysql-2 is UP, reason: Layer7 check passed, code: 200, check duration: 457ms. 2 active and 1 backup servers online. 0 sessions requeued, 0 total in queue.
This configures only layer7 checks right? Is there a way in the configuration file to disable the layer4 checks?
I tried with tcp-check and mysql-check options but i am getting the same results.
Thanks for any help!!
Ok, can you provide a tcpdump ( tcpdump -i ethX -pns0 -w health-check-traffic.cap host <IP1> and port 9200 ) of the entire health check traffic as well as the output of haproxy -vv.
I checked the tcpdump, although the script is sending “200 ok” there was lot of tcp retransmission and reset in the communication. I am trying to replace the bash script with a python script. I will update after making the changes.
I have recreated the situation with your exact release and configuration and I don’t see any issues.
However looking at your traces again it becomes clear that I was too focused on the HTTP transaction. The haproxy log was correct all along and the traces confirm it:
Sometimes the Port 9200 responds (and then the health check succeeds), and sometimes the TCP handshake to port 9200 is just flat out rejected (SYN → RST, ACK).
#!/usr/bin/env python3
import subprocess
import sys
import os
import time
# Function to load the config file and remove quotes around values
def load_config(config_file):
config = {}
with open(config_file, 'r') as f:
for line in f:
# Ignore comments and empty lines
line = line.strip()
if not line or line.startswith('#'):
continue
key, value = line.split('=', 1)
# Strip surrounding quotes from value
config[key.strip()] = value.strip('"').strip("'")
return config
# Function to send the HTTP reply (used for returning status codes)
def httpReply(HTTP_STATUS, RESPONSE_CONTENT):
CONTENT_LENGTH = len(RESPONSE_CONTENT)
if HTTP_STATUS == "503":
print("HTTP/1.1 503 Service Unavailable")
elif HTTP_STATUS == "200":
print("HTTP/1.1 200 OK")
else:
print(f"HTTP/1.1 {HTTP_STATUS}")
print("Content-Type: text/plain")
print("Connection: close")
print(f"Content-Length: {CONTENT_LENGTH}")
print() # Blank line after headers
print(RESPONSE_CONTENT)
time.sleep(0.1)
# Function to execute the MySQL command and fetch status
def get_mysql_status(command):
try:
result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True, text=True)
# Return the output as a list of lines, stripping unnecessary spaces
return result.stdout.strip().splitlines()
except subprocess.CalledProcessError as e:
return None
# Main function
def main():
# Load configuration
config_file = '/etc/sysconfig/clustercheck'
if not os.path.exists(config_file):
print(f"Config file {config_file} does not exist.")
sys.exit(1)
config = load_config(config_file)
# Get configuration values, using defaults if necessary
MYSQL_USERNAME = config.get('MYSQL_USERNAME', 'clustercheckuser')
MYSQL_PASSWORD = config.get('MYSQL_PASSWORD', 'clustercheckpassword!')
AVAILABLE_WHEN_DONOR = int(config.get('AVAILABLE_WHEN_DONOR', 0))
ERR_FILE = config.get('ERR_FILE', '/dev/null')
AVAILABLE_WHEN_READONLY = int(config.get('AVAILABLE_WHEN_READONLY', 1))
DEFAULTS_EXTRA_FILE = config.get('DEFAULTS_EXTRA_FILE', '/etc/my.cnf')
TIMEOUT = int(config.get('TIMEOUT', 10))
# Timeout exists for instances where mysqld may be hung
EXTRA_ARGS = []
if MYSQL_USERNAME:
EXTRA_ARGS.append(f"--user={MYSQL_USERNAME}")
if MYSQL_PASSWORD:
EXTRA_ARGS.append(f"--password={MYSQL_PASSWORD}")
if os.path.isfile(DEFAULTS_EXTRA_FILE):
MYSQL_CMDLINE = [
'mysql', '--defaults-extra-file=' + DEFAULTS_EXTRA_FILE, '-nNE',
'--connect-timeout=' + str(TIMEOUT)
] + EXTRA_ARGS
else:
MYSQL_CMDLINE = [
'mysql', '-nNE', '--connect-timeout=' + str(TIMEOUT)
] + EXTRA_ARGS
# Check the wsrep_local_state
WSREP_STATUS = get_mysql_status(MYSQL_CMDLINE + ["-e", "SHOW STATUS LIKE 'wsrep_local_state';"])
if not WSREP_STATUS:
httpReply("503", "Received empty reply from Percona XtraDB Cluster Node.\r\nMight be a permission issue, check the credentials used by the script.")
sys.exit(1)
# Now, we handle the state based on the fetched status
wsrep_state = WSREP_STATUS[-1].split()[-1]
if wsrep_state == "4" or (wsrep_state == "2" and AVAILABLE_WHEN_DONOR == 1):
if AVAILABLE_WHEN_READONLY == 0:
# Check if the node is in read-only mode
READ_ONLY = get_mysql_status(MYSQL_CMDLINE + ["-e", "SHOW GLOBAL VARIABLES LIKE 'read_only';"])
if READ_ONLY and 'ON' in READ_ONLY[-1]:
httpReply("503", "Percona XtraDB Cluster Node is read-only.\r\n")
sys.exit(1)
httpReply("200", "Percona XtraDB Cluster Node is synced.\r\n")
sys.exit(0)
else:
if not WSREP_STATUS:
httpReply("503", "Received empty reply from Percona XtraDB Cluster Node.\r\nMight be a permission issue, check the credentials used by the script.")
else:
httpReply("503", "Percona XtraDB Cluster Node is not synced.\r\n")
sys.exit(1)
if __name__ == "__main__":
main()
For the record: I am now testing the python version above, to see if this works better than the original bash clustercheck script. Most of the python code was generated through chatgpt, by the way. But it seems to work nicely, so far.
Just updating here. We see less of them, but still get an occasional “socket error”, and the haproxy status changes to purple for the affected backend server. After a few seconds, it changes back to green and “succeeded” in haproxy.log
If anyone has insights to share, we would appreciate feedback.