Haproxy and varnish utilization on NUMA CPU

Hello

we got some servers that we plan to use as CDNs that got NUMA CPUs. We plan to run only haproxy and varnish on those servers, sample lscpu -e below

lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
  0    0      0    0 0:0:0:0          yes 3200.0000 1000.0000
  1    0      0    1 1:1:1:0          yes 3200.0000 1000.0000
  2    0      0    2 2:2:2:0          yes 3200.0000 1000.0000
  3    0      0    3 3:3:3:0          yes 3200.0000 1000.0000
  4    0      0    4 4:4:4:0          yes 3200.0000 1000.0000
  5    0      0    5 5:5:5:0          yes 3200.0000 1000.0000
  6    0      0    6 6:6:6:0          yes 3200.0000 1000.0000
  7    0      0    7 7:7:7:0          yes 3200.0000 1000.0000
  8    0      0    8 8:8:8:0          yes 3200.0000 1000.0000
  9    0      0    9 9:9:9:0          yes 3200.0000 1000.0000
 10    1      1   10 10:10:10:1       yes 3200.0000 1000.0000
 11    1      1   11 11:11:11:1       yes 3200.0000 1000.0000
 12    1      1   12 12:12:12:1       yes 3200.0000 1000.0000
 13    1      1   13 13:13:13:1       yes 3200.0000 1000.0000
 14    1      1   14 14:14:14:1       yes 3200.0000 1000.0000
 15    1      1   15 15:15:15:1       yes 3200.0000 1000.0000
 16    1      1   16 16:16:16:1       yes 3200.0000 1000.0000
 17    1      1   17 17:17:17:1       yes 3200.0000 1000.0000
 18    1      1   18 18:18:18:1       yes 3200.0000 1000.0000
 19    1      1   19 19:19:19:1       yes 3200.0000 1000.0000
 20    0      0    0 0:0:0:0          yes 3200.0000 1000.0000
 21    0      0    1 1:1:1:0          yes 3200.0000 1000.0000
 22    0      0    2 2:2:2:0          yes 3200.0000 1000.0000
 23    0      0    3 3:3:3:0          yes 3200.0000 1000.0000
 24    0      0    4 4:4:4:0          yes 3200.0000 1000.0000
 25    0      0    5 5:5:5:0          yes 3200.0000 1000.0000
 26    0      0    6 6:6:6:0          yes 3200.0000 1000.0000
 27    0      0    7 7:7:7:0          yes 3200.0000 1000.0000
 28    0      0    8 8:8:8:0          yes 3200.0000 1000.0000
 29    0      0    9 9:9:9:0          yes 3200.0000 1000.0000
 30    1      1   10 10:10:10:1       yes 3200.0000 1000.0000
 31    1      1   11 11:11:11:1       yes 3200.0000 1000.0000
 32    1      1   12 12:12:12:1       yes 3200.0000 1000.0000
 33    1      1   13 13:13:13:1       yes 3200.0000 1000.0000
 34    1      1   14 14:14:14:1       yes 3200.0000 1000.0000
 35    1      1   15 15:15:15:1       yes 3200.0000 1000.0000
 36    1      1   16 16:16:16:1       yes 3200.0000 1000.0000
 37    1      1   17 17:17:17:1       yes 3200.0000 1000.0000
 38    1      1   18 18:18:18:1       yes 3200.0000 1000.0000
 39    1      1   19 19:19:19:1       yes 3200.0000 1000.0000

I recently read Architectural limitation for nbproc? - #3 by willy and based on that, I did:

  • disable irqbalance
  • NIC interrupts on node 0
  • pinning haproxy to node 0

What about varnish tho? Since it also heavily relies on network communication, it should be pinned to node 0 too, leaving node 1 completely unused? Or can I pin it to node 1 (or leave unpinned)?

This is not strictly haproxy related but I will be grateful for any kind of advice.

Thanks
Jakub

You need to think in terms of probability of transfers. For example if your disk controller is connected to one node, and the network controller to the other node, you’d say that 100% of the traffic passes through haproxy and should be on the same node as the network controller, and that maybe 50% of the traffic comes from the cache, half of which is in RAM and half requires disk accesses, and the 50% remaining are on the network. Then it can make sense to assign varnish to the other node with the disk controller, knowing that 50% of the time it will still require network accesses that will have to flow via the neighbor node.

What I would recommend in any case is to make sure that the same CPU is never used by two userland processes, so you’ll definitely have to pind both haproxy and varnish so that they don’t step on each other’s feet. But you can try to put them on the same node with less cores, just like you can try to split them apart.

If your I/Os (NIC and disk) are on the same node, and if your traffic is mostly SSL, it could make sense to do the opposite, and place varnish on the I/O node and haproxy on the other one. What you’ll lose in terms of network access that will have to cross the nodes could be compensated by having all of the second socket’s CPU cores availble for SSL.

In any case you must experiment with various combinations, and take notes of what you’re doing, your observations and what you are certain about. Usually when experimenting with such tuning, you start by overlooking certain elements (some sysctls or IRQ binding, etc) and your initial conclusion can be wrong because your observations were affected by another setting. That’s why it’s extremely important to rigorously note everything and to limit your observation bias.

You could be very surprised to find unexpected optimal situations such as using only a few cores of a same node for each process or things like this. That’s not uncommon at all even if it can feel frustrating.

Thanks for the reply. It is only SSL traffic, there shouldnt be too many different customers (connections) but the bandwidth will be quite big (it is an OTT video playout CDN), we hope for around 70% cache hit rate.

I guess I will try to pin varnish to the second node to which the PCI SSD is connected to. The thing is that we’re not having capacity issues with the servers just yet, not really sure how to benchmark it

Thank you again :+1:

It’s always difficult to benchmark caches, because that involves many objects and load generators are not always convenient for this. Most of those dealing with that situation end up testing them in production by having several servers configured differently and they observe how their servers behave. Some will not do well and others will do much better, and it will indicate them which best solution to pick in the end. If I were you, I’d simply add one extra server configured as you are planning to an existing cluster and observe.

1 Like