Surprisingly Poor SSL-Termination Performance

We’ve been testing a haproxy configuration to perform SSL termination for an IRC server. InspIRCd can do its own SSL termination for us but we wanted to add Proxy Protocol support but InspIRCd can’t do both at the same time.

The suggested solution was to run haproxy for termination and then connect raw to InspIRCd with a unix domain socket – in theory you might end up burning a bit more CPU time in total but you’d get better multi-core scaling since you could be doing the crypto on a different core.

We were very shocked to find that haproxy used much, much more CPU than expected – in fact we found it offloaded nearly no CPU load at all and ended up adding as much load as the whole IRC server was!

Our initial attempt was single-process but was quickly swamped. Since our InspIRCd server is already a cluster of 5 spoke processes anyway doing one haproxy process per back-end process seemed like a natural fit (we don’t need any load balancing or anything fancy – just termination).

It’s very difficult for us to get consistent live-fire testing because our load fluctuates quite a bit throughout the day – but here’s two snapshots where the number of concurrent SSL connections is probably around 8000 per process.

These measurements were on the same machine (bare-metal hex-core Xeone), with the same ciphers.

Without haproxy we’re looking at about 10% CPU per spoke (pid 25023 is the hub the spokes communicate through) – here’s a typical 10-second average top snapshot:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
25125 ircd      20   0 4559864 416028  12520 S   9.7  0.3 239:40.11 inspircd
25193 ircd      20   0 4233396 417604  12532 S   9.6  0.3 241:10.11 inspircd
25091 ircd      20   0 4041320 422916  12516 S   9.3  0.3 244:37.56 inspircd
25159 ircd      20   0 4035264 415608  12368 S   9.2  0.3 238:41.26 inspircd
25057 ircd      20   0 3315684 415732  12628 S   8.9  0.3 239:06.11 inspircd
25023 ircd      20   0  632500 125948  12112 S   3.3  0.1  89:43.57 inspircd

With 5 haproxy workers added it looks like this:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23879 ircd      20   0  380308 313148   8752 S  12.2  0.2  74:28.61 haproxy
23880 ircd      20   0  378728 311508   8752 R  12.2  0.2  75:31.97 haproxy
23881 ircd      20   0  378352 310852   8752 S  12.2  0.2  75:02.69 haproxy
23883 ircd      20   0  381000 313292   8748 S  12.1  0.2  70:48.66 haproxy
23882 ircd      20   0  380848 312804   8748 S  11.9  0.2  73:18.27 haproxy
25125 ircd      20   0 4297356 183900  12520 S  11.4  0.1  68:34.24 inspircd
25091 ircd      20   0 3772620 183128  12516 S  11.2  0.1  69:15.45 inspircd
25159 ircd      20   0 3772768 183120  12368 S  11.2  0.1  68:18.97 inspircd
25193 ircd      20   0 3970408 184744  12532 S  11.1  0.1  68:17.81 inspircd
25057 ircd      20   0 2986488 182640  12628 S  10.9  0.1  68:48.68 inspircd
25023 ircd      20   0  630388 123796  12112 S   4.5  0.1  26:35.82 inspircd

In this snapshot there were 5-10% more users so the per-spoke load is a bit higher but it’s close enough to make my point: in this configuration haproxy spends more time doing SSL termination than InspIRCd does doing everything else.

That is: instead of 5-10% more load for we’re seeing > 100% more load – and this isn’t an isolated case – I’ve been trying to improve this for a while now.

I’m assuming that both pieces of software are using the same OpenSSL library under the hood so there’s no obvious reason for it to be so different.

We’re on Ubuntu 18, with HA-Proxy version 1.8.8-1ubuntu0.6 2019/10/23 – I know there is newer but this is what Ubuntu offered.

An abbreviated config has been attached below (except there are 5 FE and 5 BE servers). It’s all pretty standard stuff and I’ve skimmed the manual a few times – it just doesn’t make any sense.

I’m open to suggestions but live-fire tests may be difficult to arrange promptly because we can only do these tests at low-tide – I’m gravely concerned that this scaling would swamp our server if we had too many users at once.

global
    log /dev/log    local0
    log /dev/log    local1 notice

    # necessary to access the UDS
    user ircd
    group ircd

    daemon
    maxconn 200000

    nbproc 5

    cpu-map         1 0
    cpu-map         2 1
    cpu-map         3 2
    cpu-map         4 3
    cpu-map         5 4

    ssl-default-bind-ciphers AES128-SHA256:ECDH+AESGCM:DH+AESGCM:...
    ssl-default-bind-options no-sslv3

defaults
    log     global

    mode    tcp
    option  dontlognull
    option  tcplog
    maxconn 200000

    timeout connect 10s

    timeout client 4m
    timeout server 4m

frontend cloudflare_frontend5
    bind *:6675 ssl crt /etc/ssl/private/site.pem accept-proxy
    default_backend inspircd_backend5
    bind-process 1

frontend cloudflare_frontend6
    bind *:6676 ssl crt /etc/ssl/private/site.pem accept-proxy
    default_backend inspircd_backend6
    bind-process 2

    ...

backend inspircd_backend5
    log global
    server local_irc5 /home/ircd/InspIRCd1/run/proxy.sock send-proxy-v2

backend inspircd_backend6
    log global
    server local_irc6 /home/ircd/InspIRCd2/run/proxy.sock send-proxy-v2