HAProxy SSL Offloading high CPU usage on CentOS 7

Hi,

I’m making some tests with HAProxy to do SSL offloading in Centos 7. I’ve seen that the cpu easily reach 100% and the haproxy idle pct is almost 0%. The test is just a 1000 clients connections.

I’ve also done the same tests in ubuntu 16 to compare and the results are better by far.

This is the info of every platform for the tests:

HAproxy configuration (same for both)

global
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     20000
    user        haproxy
    group       haproxy
    daemon
    tune.ssl.default-dh-param   2048
    cpu-map                     1 0
    ssl-default-bind-options no-sslv3 no-tls-tickets
    ssl-default-bind-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA

    ssl-default-server-options no-sslv3 no-tls-tickets
    ssl-default-server-ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA

defaults
    mode                    http
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         5s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 4s
    timeout check           10s

frontend http
    bind    :80
    mode    http
    maxconn 20000
    default_backend back1

frontend https
    bind    :443 ssl crt /etc/ssl/test.pem
    mode    http
    maxconn 20000
    default_backend back1

backend back1
    balance source
    mode    http
    option  forwardfor
    option  http-server-close
    timeout check 3s
    http-check expect rstatus 200
    stick-table type ip size 10k
    stick on src

    server app1 x.x.x.x:80 weight 1 check inter 10s fall 3 maxconn 1000

centos 7 - haproxy 1.7.5 - openssl-1.0.1e-fips

#haproxy -vv
HA-Proxy version 1.7.5 2017/04/03
Copyright 2000-2017 Willy Tarreau <willy@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
  OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
        [COMP] compression
        [TRACE] trace
        [SPOE] spoe

ubuntu 16 - haproxy 1.6.3 - openssl-1.0.2g

#haproxy -vv
HA-Proxy version 1.6.3 2015/12/25
Copyright 2000-2015 Willy Tarreau <willy@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2
  OPTIONS = USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.2g-fips  1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.38 2015-11-23
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Graphs - CPU Usage - Idle PCT - 1000 concurrent clients

CentOS 7

Ubuntu 16

The same tests repeated through HTTP offer equal results in the same platform. So it seems something related to the openssl library.

Digging into the syscalls; I’ve created some flame graphs to see what happen under the hood:

CentOS 7

Ubuntu 16

As you can see, there is the bn_mul_mont function that is in the CPU for a long time in CentOS 7.

Can anyone clarify why in CentOS 7 the use of the CPU is so high? Could be because the openssl library in CentOS 7 uses the FIPS encryption? Does anyone suffer the same behaviour?

Thanks for you help,
David

Well, there a lot of differences between those 2 setups. At the very least, you would have to find out what cipher is actually negotiated in both setups; if they are not the same, then that could be one contributing factor, for example: DHE cipher suites are very CPU intensive for the server, while ECDHE are not.

Exactly what tool do you use to test those “1000 clients connections” and how are you using it? I assume your benchmark is opening 1000 SSL sessions for short requests, which basically measures SSL handshakes and nothing else.

Another issue could be the FIPS mode on CentOS:
https://groups.google.com/forum/#!topic/mailing.openssl.users/k22KTeuh-ug

Also, OpenSSL 1.0.2 may contain optimizations that benefit your use case.

I really don’t see those 2 setups performing the same, given the fundamental differences between them (1.0.1-fips vs 1.0.2 non-fips).

Also, you probably get more inside for those issues and the openssl-users:
https://www.openssl.org/community/mailinglists.html.

The tool I use is loader.io and the test is:

http://support.loader.io/article/16-test-types#per-second

Configured with 1000 clients per second for 30s.

I’ve also done tests with ubuntu 14 that uses by default openssl-1.0.1f (non-fips) with the same results that ubuntu 16 with openssl-1.0.2 (non-fips). I’ve also tested centos 7 + openssl-1.0.1f-fips with the same results.

It seems something related to Centos 7 with FIPS.

I will continue doing more tests.