Monitoring best practices

We’ve previously had some issue with performance of our haproxy instances. We’ve had latency issues and the question has been asked how performant is our loadbalancer setup. I’m new to haproxy and was wondering what would be some of the key metrics to watch?

We are looking to try and stress test our environment and in particular haproxy to see at what point haproxy starts to struggle. So the questions that stand out to me are how do we define when it is haproxy that starts to struggle? Is it a particular metric or set of metrics? Are there certain tests that would more accurately test haproxy and not just the backend servers?

We currently use the multithreading setup on 1.8.14, but have been asked if multiprocess is better, but without being able to ascertain how to define which is better it is hard to give an educated answer to that.

Any insight would be much appreciated. If there is anything, of which I’m sure there is, that you need further clarification on I’ll do my best to get those.
Thanks

Anyone have any input on this? Of the many metrics that are produced via haproxy what ones are the best that typically tell tale of issues that are attributed to haproxy?

What metrics you will have to look at changes with the different configurations. If you forward TCP traffic with TCP splicing at 40Gbit/s you have to look at different metrics than when you are terminating a lot of TLS traffic from browsers.

Share you configuration to give us an idea about what haproxy is actually doing for you.

Generically speaking you’d look at CPU usage in the kernel and userspace, memory usage, connection numbers in haproxy stats while comparing them with your actual configuration, backend server bottlenecks, kernel, iptables and especially conntrack bottlenecks, source port exhaustion and of course, actual NIC and upstream bandwidth. This is a very generic, vague and very likely incomplete list out of the top of my head.

Which is better is the wrong question to ask. What configuration works best for the requirements you have, while maintaining adequate performance, security, visibility would be a question I would ask. Answering it is requires in-depth knowledge of your setup.

Thank you for indulging my noviceness and helping. Primarily handle tls termination from browser or web app calls. Primarily 2 frontends with ~8 backends. All setup using the defaults and no backend/frontend specific settings. All backends use leastconn for their algorithms.
Please let me know if there is any other info that may be helpful.

As requested here is some of the config.

global

  log         127.0.0.1 local0 debug
  log         127.0.0.1 local1 notice

  nbproc 1
  nbthread 16
  cpu-map auto:1/1-16 1-16 #reserving virtual core 0 for OS

  chroot      /var/lib/haproxy
  pidfile     /var/run/haproxy.pid
  maxconn     1000000
  user        haproxy
  group       haproxy
  daemon

  ssl-default-bind-options no-sslv3

  ssl-default-bind-ciphers <cipher>

  ssl-default-server-ciphers <cipher2>

  ssl-server-verify required

  # turn on stats unix socket
  stats socket /var/lib/haproxy/stats

  server-state-file global
  server-state-base /var/lib/haproxy/state

defaults

  mode            http
  log             global
  option          httplog
  option          dontlognull
  option          dontlog-normal
  option          http-server-close
  option          redispatch`


  retries         3
  timeout         http-request        10s
  timeout         queue               10m
  timeout         connect             10s
  timeout         client              10m
  timeout         server              10m
  timeout         http-keep-alive     10s
  timeout         check               10s
  maxconn         100000

  load-server-state-from-file global

I know you all have had your hands busy with 1.9 and the holidays, just wanted to bump this to let you know I’m still here. 1.9 looks great and will be looking at testing that out!

You would stress test haproxy with tools like ab or httpterm. Especially stress test the SSL layer. SSL handshakes are always expensive, so that will certainly be an important topic to look at. Make sure you use ECC certificates as well as RSA one, as ECC is cheaper for the server to negotiate.

Test throughput, test transaction per seconds with and without keep-alive.

Once haproxy 1.9 settles in, consider upgrading to it, threading performance will be superior there.

Other than that, the generic advice from above is still true: CPU usage, connection numbers, iptables and conntrack bottlenecks are things to have a look at.