I’ve verified this with Haproxy 1.5, 1.7.10 and 1.8.8. I’ve tried many different configurations, including multi-process (1.5 and 1.7) and multi-threading in 1.8.
Here’s the problem:
When using ab to bench a pair of web servers behind haproxy, I get about 9000 requests/sec. with http:, but when I switch to https: (with haproxy terminating the SSL connection), it drops to 250 - 270 requests/sec.
I have tried a number of different configurations recommended both on the Haproxy site and elsewhere, multiple sockets, different ways of assigning processes, etc., and all wind up in that range. I even tried limiting the ciphers selected to a small, “higher performing” set. No luck.
I figured out that the SSL speed is about 2.7% of the non-SSL speed. When I went back and looked at a couple of well-known how-tos I had refered to about speeding up SSL, and when I looked at the data, those guys were only able to get about 3% of the transactions in SSL that they got with non-SSL tests. Most of the docs on squeezing performance out of HaProxy are about http transactions, not SSL.
Here’s what bothers me about this: When I use a single CPU config (nbproc 1/nbthread 1), the CPU usage for the haproxy process goes up to 97% (and we see 250 to 270 requests/second). But when I use 10 threads or 10 processes, the load does get split across the additional threads, but they never go beyond 30% to 50% CPU usage, and I STILL get 250 to 270 requests/second. In fact, with haproxy 1.8.8, I saw four threads at about 35% usage, and the other threads down in the 13% to 19% usage. So even though the load is being shared, it’s not maxing out the hardware the way I would think it should.
If haproxy can max out 1 core, shouldn’t it be able to max out 10 cores and produce higher SSL through-put?
It seems like we could get better throughput if all of the cores/threads were pushing harder, but I’m not sure how to get those results.
This was all on centos 6.
Any thoughts or ideas from anyone on this forum about how we might be able to squeeze more SSL performance? Is there something I am missing?