Very Slow Performance


#1

Hello, I am setting up a new haproxy server and am getting unusually slow performance compared to a single server solution.

My setup is a 6 server setup. 2 ha proxy front ends, 2 apache web servers, 2 database servers.The servers are all VM servers. There is a 1GB connection between all of them.

When I run a ad test, I am running a straight comparison of

ab -n 250 -c 10 -s 80 http://$sitename

Below are my results.

Single Server Setup. Apache and MySQL on same server.

Concurrency Level: 10
Time taken for tests: 28.631 seconds
Complete requests: 250
Failed requests: 0
Total transferred: 11373250 bytes
HTML transferred: 11245750 bytes
Requests per second: 8.73 [#/sec] (mean)
Time per request: 1145.239 [ms] (mean)
Time per request: 114.524 [ms] (mean, across all concurrent requests)
Transfer rate: 387.93 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.6 0 4
Processing: 666 1135 516.3 1060 8505
Waiting: 643 1080 218.6 1037 2101
Total: 666 1136 516.4 1061 8505

HA Proxy 6 Server Setup. 6 Server Setup.

Concurrency Level: 10
Time taken for tests: 82.300 seconds
Complete requests: 250
Failed requests: 0
Total transferred: 11955000 bytes
HTML transferred: 11811500 bytes
Requests per second: 3.04 [#/sec] (mean)
Time per request: 3291.984 [ms] (mean)
Time per request: 329.198 [ms] (mean, across all concurrent requests)
Transfer rate: 141.86 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 1598 3256 712.7 3513 4380
Waiting: 1562 3214 711.0 3461 4338
Total: 1599 3256 712.7 3514 4380

Below is my setup. With redacted settings.

global
    log         127.0.0.1 local2

chroot      /var/lib/haproxy
pidfile     /var/run/haproxy.pid
maxconn     4000
user        haproxy
group       haproxy
daemon

stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

frontend  $REDACTED *:80
    acl url_static       path_beg       -i /static /images /javascript /stylesheets
    acl url_static       path_end       -i .jpg .gif .png .css .js
    option http-server-close


default_backend             $REDACT

listen stats *:9000
    maxconn     100
    mode http
    log global
    stats enable
    stats refresh 30s
    stats hide-version
    stats show-node
    stats uri /stats
    stats auth $REDACTED

backend $REDACTED
    balance     roundrobin
    server  $REDACTED $REDACTED:80 check maxconn 50
    server  $REDACTED $REDACTED:80 check maxconn 50
    option abortonclose
    option httpclose

#2

Its the connection setup.

You had “ab -> Apache” on a single machine, now you have “ab -> haproxy -> Apache” on different machines. You don’t keepalive anywhere, and http://$sitename is probably not that very heavy to serve for the backend, so what you basically measuring is connection setup times, which you just tripled in your new configuration.

Enable HTTP keepalive and reuse everywhere (including ab), use persistent MySQL connections, check conntrack and make sure the virtualization is not causing any issue (make sure to dedicate RAM and CPU ressource to the single VM’s).


#3

Thanks. I will try that.

I am looking at haproxy logs.

May  9 17:35:45 localhost haproxy[31849]: $IP:60920 [09/May/2017:17:35:42.521] drpweb-front drpweb/drpweb02 0/0/0/2574/2621 200 47820 - - ---- 9/9/9/3/0 0/0 "GET / HTTP/1.0"
May  9 17:35:45 localhost haproxy[31849]: $IP:60915 [09/May/2017:17:35:41.640] drpweb-front drpweb/drpweb01 0/0/0/4172/4219 200 47820 - - ---- 9/9/9/6/0 0/0 "GET / HTTP/1.0"

It this abronmally high TR?
0/0/0/2574/2621
0/0/0/4172/4219


#4

I would say so, yes. This is about the server taking a long time to generate the response.