Performance since haproxy 2.0

Hi devs,

We were planning to upgrade existing haproxy-1.6 to the newest haproxy, but met performance problem with those versions.

Then I tested performance on haproxy-1.6.15 / haproxy-1.7.12 / haproxy-1.8.25 / haproxy-1.9.15 / haproxy-2.0.14 / haproxy-2.1.3,
and turned out that seems haproxy performance has been reduced since haproxy-2.0.14.

All these tested have been on same clients/server/nginx machinces, with same haproxy.cfg, and nginx have been tuned to support benchmark rps:9*7k,latencies:4+ms,success ratio:100%.
During those test, nothing except haproxy version is changed.

                    RPS,    Latencies,  Success ratio
haproxy-1.6.15		9*7k    3+ms        100%
haproxy-1.7.12		9*7k    3+ms        100%
haproxy-1.8.25		9*7k    4+ms        100%
haproxy-1.9.15		9*7k    4+ms        100%
haproxy-2.0.14		9*6.2k  3+ms        100%        # start to reduce.
haproxy-2.1.3       9*5.5k  4+ms        100%        # worse

Are there any special configurations (when building haproxy or in haproxy.cfg etc.) need to be token care to tune 2.0/2.1 to same performance?

More information about environment:

Test command:

# ca related options can be ignored in this case.
RATE=7000 DURATION=120 SERVER=3 TIMEOUT=30; echo "GET http://192.168.11.${SERVER}/" | ./vegeta -cpus 4 attack -duration=${DURATION}s -timeout=${TIMEOUT}s -key ca.key -cert ca.crt -rate $RATE -keepalive 1 -insecure | tee results.bin | ./vegeta report | grep -v "^Get "; date

These versions are downloaded from http://www.haproxy.org/#down, and compile with following command:

make TARGET=linux-glibc CPU=x86_64 USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_CPU_AFFINITY=1 LUA_LIB_NAME=lua5.3 LUA_INC=/usr/include/lua5.3/

Server is a 4C4G Ubuntu 16.04.6 LTS (Xenial Xerus), 4.4.0-142-generic KVM, and the hypernode is completely idle and stable.

haproxy.cfg

root@localhost:~# cat haproxy.cfg
global
        log 127.0.0.1   local0 info
        maxconn 100000
        tune.ssl.default-dh-param 2048
        daemon
        nbproc 4
        cpu-map 1 0
        stats socket /haproxy/haproxy.1 process 1
        cpu-map 2 1
        stats socket /haproxy/haproxy.2 process 2
        cpu-map 3 2
        stats socket /haproxy/haproxy.3 process 3
        cpu-map 4 3
        stats socket /haproxy/haproxy.4 process 4

defaults
        log     global
        option  dontlognull
        option  redispatch
        retries 3
        maxconn 100000
        timeout connect 5s
        timeout client 50s
        timeout server 50s
        timeout tunnel  1h
        option  tcpka
        # errorfile 400 /etc/haproxy/errors/400.http
        # errorfile 403 /etc/haproxy/errors/403.http
        # errorfile 408 /etc/haproxy/errors/408.http
        # errorfile 500 /etc/haproxy/errors/500.http
        # errorfile 502 /etc/haproxy/errors/502.http
        # errorfile 503 /etc/haproxy/errors/503.http
        # errorfile 504 /etc/haproxy/errors/504.http
        timeout client-fin 30s

listen  lbl-noiqom5f
        bind *:80
        mode http
        option httplog
        maxconn 400000
        timeout client 50s
        timeout tunnel 3600s

        default_backend lbl-noiqom5f_default

backend lbl-noiqom5f_default
        mode http
        option httplog
        option http-keep-alive
        balance roundrobin
        timeout server 50s
        timeout tunnel 3600s
        timeout check 5000

        server  lbb-7pq2k6mm 192.168.11.16:80 check inter 10000 fall 2 rise 5 weight 1
root@localhost:~#

No idea if it will help but I’m pretty sure nbproc is not really recommend anymore.

If you get rid of the nbproc line and the related cpu-map and per process stats socket definitions HAproxy 2.x will automatically start a number of threads equal to your CPU cores (you could also set this manually via nbthread X)

As I said, I haven’t benchmarked the difference (if any) but I do know multi-process mode was described as being less favourable compared to the newer multi-threaded mode.

Also, not that I think it’s related to performance, but you could combine your listen and backend sections as follows to make your config a little less verbose:

listen  lbl-noiqom5f
        bind *:80
        mode http
        option httplog
        maxconn 400000
        option http-keep-alive
        balance roundrobin
        timeout check 5000

        server  lbb-7pq2k6mm 192.168.11.16:80 check inter 10000 fall 2 rise 5 weight 1

Hi Andrew,

Thanks for the hint, and I’ve tested ‘nbthread’ just now, and still not work to improving performance, and got worse performance under haproxy-2.0.14: rps can only reach 9*5.5k with latencies 3ms and success ratio 100%, which means 12% performance loss from nbproc to nbthread.

root@localhost:~# diff haproxy.cfg haproxy.nbthread.cfg
6,14c6,15
<         nbproc 4
<         cpu-map 1 0
<         stats socket /haproxy/haproxy.1 process 1
<         cpu-map 2 1
<         stats socket /haproxy/haproxy.2 process 2
<         cpu-map 3 2
<         stats socket /haproxy/haproxy.3 process 3
<         cpu-map 4 3
<         stats socket /haproxy/haproxy.4 process 4
---
>         nbthread 4
>         # nbproc 4
>         # cpu-map 1 0
>         # stats socket /haproxy/haproxy.1 process 1
>         # cpu-map 2 1
>         # stats socket /haproxy/haproxy.2 process 2
>         # cpu-map 3 2
>         # stats socket /haproxy/haproxy.3 process 3
>         # cpu-map 4 3
>         # stats socket /haproxy/haproxy.4 process 4
root@localhost:~#

Confirm that 4 threads have been created instead.

root@localhost:~# ps -T -p 57639
  PID  SPID TTY          TIME CMD
57639 57639 ?        00:04:08 haproxy
57639 57640 ?        00:04:08 haproxy
57639 57641 ?        00:04:08 haproxy
57639 57642 ?        00:04:10 haproxy
root@localhost:~#

Would anyone in community may try these benchmark officially?
It may be have specific configurations need to be changed, or haproxy do have performance degradation after upgradation, or the hardware unsuitable maybe.

this might be related to the code for the threads.
First, I would add one bind per process and I would also try the threading model for 2.0 and 2.1.

for the bind, yo must do something like:

bind *:80 process 1
bind *:80 process 2
bind *:80 process 3
bind *:80 process 4

and with threads, this should be automatic, but you can steel do:
bind *:80 process 1/1

bind *:80 process 1/2

bind *:80 process 1/3

bind *:80 process 1/4

Also, what client are you using, what type of infra is that? 10K looks small number and HAProxy might be limited by nignx. Can you run multiple nginx?

Baptiste

what client are you using, what type of infra is that

RATE=7000 DURATION=120 SERVER=3 TIMEOUT=30; echo "GET http://192.168.11.${SERVER}/" | ./vegeta -cpus 4 attack -duration=${DURATION}s -timeout=${TIMEOUT}s -key ca.key -cert ca.crt -rate $RATE -keepalive 1 -insecure | tee results.bin | ./vegeta report | grep -v "^Get "; date

root@client:~# ./vegeta -version
Version: 12.8.3
Commit: d9b795aec8585a0fb435072f68d842d596c332de
Runtime: go1.14 linux/amd64
Date: 2020-03-25T11:03:54Z"
root@client:~#

10K looks small number and HAProxy might be limited by nignx

Not 10K, but 9*7k.

Single nginx performance has been tested, can reach up to 97k rps with latencies 4ms in http mode, and the haproxy is a 4C4G kvm (the hypernode is completely idle and stable, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz).
I’ve also tested with 4 nginxs, and haproxy(1.6) still reach 9
7k.
So I’m sure that nginx won’t be bottleneck here.