Hello everyone,
I have a high-load (~10k reqs/s) HaProxy 2.6.1 instance with HTTP mode backends with L4 health checks. Everything works fine until a minor disturbance causes a backend to become unavailable for a few seconds. Then suddenly CPU usage skyrockets up to the point that not even the health checks themselves are performed and all backends are declared dead worsening the issue.
I found out, that disabling the L4 check by removing the “check” parameter on each server solves the issue.
I created a minimal example:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
defaults
mode http
frontend some_frontend
bind *:8080
stats enable
stats uri /stats
default_backend some_backend
backend some_backend
server some_server 8.8.8.8:8000 check
Blocking the IP via iptables -A OUTPUT -p tcp --dst 8.8.8.8 --dport 8000 -j DROP
simulates an outage. Bombarding HaProxy with any load testing tool like wrk
reveals high CPU usage. After removing check
from the config, reloading and load-testing with the same workload again the usage is way lower, almost non-existant.
I put a full Vagrant example on GitHub.
Does anybody know if this is a bug or a configuration issue?