We’ve previously had some issue with performance of our haproxy instances. We’ve had latency issues and the question has been asked how performant is our loadbalancer setup. I’m new to haproxy and was wondering what would be some of the key metrics to watch?
We are looking to try and stress test our environment and in particular haproxy to see at what point haproxy starts to struggle. So the questions that stand out to me are how do we define when it is haproxy that starts to struggle? Is it a particular metric or set of metrics? Are there certain tests that would more accurately test haproxy and not just the backend servers?
We currently use the multithreading setup on 1.8.14, but have been asked if multiprocess is better, but without being able to ascertain how to define which is better it is hard to give an educated answer to that.
Any insight would be much appreciated. If there is anything, of which I’m sure there is, that you need further clarification on I’ll do my best to get those.
Thanks
Anyone have any input on this? Of the many metrics that are produced via haproxy what ones are the best that typically tell tale of issues that are attributed to haproxy?
What metrics you will have to look at changes with the different configurations. If you forward TCP traffic with TCP splicing at 40Gbit/s you have to look at different metrics than when you are terminating a lot of TLS traffic from browsers.
Share you configuration to give us an idea about what haproxy is actually doing for you.
Generically speaking you’d look at CPU usage in the kernel and userspace, memory usage, connection numbers in haproxy stats while comparing them with your actual configuration, backend server bottlenecks, kernel, iptables and especially conntrack bottlenecks, source port exhaustion and of course, actual NIC and upstream bandwidth. This is a very generic, vague and very likely incomplete list out of the top of my head.
Which is better is the wrong question to ask. What configuration works best for the requirements you have, while maintaining adequate performance, security, visibility would be a question I would ask. Answering it is requires in-depth knowledge of your setup.
Thank you for indulging my noviceness and helping. Primarily handle tls termination from browser or web app calls. Primarily 2 frontends with ~8 backends. All setup using the defaults and no backend/frontend specific settings. All backends use leastconn for their algorithms.
Please let me know if there is any other info that may be helpful.
As requested here is some of the config.
global
log 127.0.0.1 local0 debug
log 127.0.0.1 local1 notice
nbproc 1
nbthread 16
cpu-map auto:1/1-16 1-16 #reserving virtual core 0 for OS
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 1000000
user haproxy
group haproxy
daemon
ssl-default-bind-options no-sslv3
ssl-default-bind-ciphers <cipher>
ssl-default-server-ciphers <cipher2>
ssl-server-verify required
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
server-state-file global
server-state-base /var/lib/haproxy/state
defaults
mode http
log global
option httplog
option dontlognull
option dontlog-normal
option http-server-close
option redispatch`
retries 3
timeout http-request 10s
timeout queue 10m
timeout connect 10s
timeout client 10m
timeout server 10m
timeout http-keep-alive 10s
timeout check 10s
maxconn 100000
load-server-state-from-file global
I know you all have had your hands busy with 1.9 and the holidays, just wanted to bump this to let you know I’m still here. 1.9 looks great and will be looking at testing that out!
You would stress test haproxy with tools like ab or httpterm. Especially stress test the SSL layer. SSL handshakes are always expensive, so that will certainly be an important topic to look at. Make sure you use ECC certificates as well as RSA one, as ECC is cheaper for the server to negotiate.
Test throughput, test transaction per seconds with and without keep-alive.
Once haproxy 1.9 settles in, consider upgrading to it, threading performance will be superior there.
Other than that, the generic advice from above is still true: CPU usage, connection numbers, iptables and conntrack bottlenecks are things to have a look at.