Requests/connections per second limited with keepalive at the client

I’m doing some performance benchmarking with HAProxy to see how many connections per second I can get. At present, I’m using apib to generate the client connections, and I have HAProxy in front of an nginx server. I can get hundreds of thousand of connections per second (e.g. 200k+) when I connect to the nginx server directly or to HAProxy with keepalive on (set at apib runtime), but as soon as I turn off keepalive with the client connections, my requests/connections per second drops to something like 30k, and I can’t figure out why. Can anyone suggest a reason or a path to improving the connections per second? I know that there are OS settings (like tcp reuse) that can affect the max connections per second, but considering that I can get a high number of connections per second in other cases, I’m imagining that my limitation is with HAProxy somehow, but I don’t know how to find out.

To clarify, if keepalive is off for apib connecting directly to the nginx server, I still see the high connections per second. It is only when I have HAProxy in the middle that the connections per second goes down dramatically.

I assume this thread is about requests per second, and not connections per second? Those are two completely different things, but it looks like you are using the two terms interchangeably.

It is expected that the amount of request per second is drastically lower without keepalive, because of the additional overhead in connection setup. It is also expected that it is drastically lower compared to a direct connection to your backend server, as there is an either additional TCP session to setup with a proxy in the middle.

That’s said, there is probably room for improvement. Share your configuration, the output of haproxy -vv and the exact apib command you are using. Also double check that you disabled conntrack on the haproxy box and check intermediate devices for any state-fullness (firewalls first of all).

I’m not sure of the best terminology to use, but I guess it is requests per second as opposed to concurrent open connections per second. I just serve a 0-byte file, and the number of completed responses of that file is what I am trying to measure. Ultimately, I’m trying to measure the number of completed SSL (TLS) handshakes per second, but I’m starting with this HTTP (not HTTPS) case so that I can rule out other issues. I know that I’m not CPU bound, since the max number of responses per second (is that a clearer terminology?) doesn’t scale up as I taskset more cores to apib, HAProxy, or nginx. I’m not limited by memory either.

I might expect that my max responses per second could get cut in half when HAProxy is in the middle (though I’m not sure which resource will limit me first), but it wouldn’t get cut by 90%, right?

If I have spare CPU cycles with enough concurrency, the connection setup shouldn’t be a limiting factor, should it?

I turn off firewalls, and I’m actually doing the measurement on one system: HAProxy listens on port 8080 and passes to port 80 on the backend.

# ./haproxy/haproxy-1.9.4/haproxy -vv
HA-Proxy version 1.9.4 2019/02/06 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -W                                                                                                                                                             no-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declarati                                                                                                                                                             on -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype                                                                                                                                                             -limits
  OPTIONS = USE_OPENSSL=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.0j  20 Nov 2018
Running on OpenSSL version : OpenSSL 1.1.0j  20 Nov 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_F                                                                                                                                                             REEBIND
Built without compression support (neither USE_ZLIB nor USE_SLZ are set).
Compression algorithms supported : identity("identity")
Built without PCRE or PCRE2 support (using libc's regex instead)
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE
              h2 : mode=HTTP       side=FE
       <default> : mode=HTX        side=FE|BE
       <default> : mode=TCP|HTTP   side=FE|BE

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace

I’ve tested with a variety of apib options, but this one is pretty typical:
# taskset -c 1-16 apib/apib -c 400 -d 10 -k 0 -K 16 http://127.0.0.1:8080/0mb_file

I’ve tried a bunch of haproxy config options, but here is a typical one:

global
        maxconn 200000
        tune.ssl.default-dh-param 2048
        #ssl-engine qat algo RSA
        #ssl-mode-async
        #nbproc 1
        nbthread 16

defaults
        mode http
        timeout connect 60s
        timeout client 60s
        timeout server 60s
        #option forceclose
        http-reuse always

frontend myfrontend
        # primary cert is /etc/cert/server.pem
        # /etc/cert/certdir/ contains additional certificates for SNI clients
        #bind :4433 ssl crt /etc/cert/server.pem crt /etc/cert/certdir/
        bind :8080
        bind :4433 ssl crt /etc/ssl/myhaproxy/myhaproxy.pem
        default_backend mybackend

backend mybackend
        balance roundrobin
        mode http
        # a http backend
        server myserver 127.0.0.1:80
        # a https backend
        #server s4 10.0.0.3:443 ssl verify none

First of all you need to configure maxconn in the frontend and on the backend server appropriately, otherwise they will default to 2000, and that will certainly have a strong effect.

Put something like 100k maxconn on both of them, at least.

I do have a global maxconn of 200k. That should suffice? If I add maxconn to the server line, it doesn’t seem to make a difference. If I read the configuration manual correctly, the server maxconn is unlimited by default.

Adding maxconn 200000 to the defaults and frontend sections did not make a difference in my case.

It needs to be in either the default section or the frontend section, and it also needs to be on the server line. Maxconn has three layers, global: per process, frontend, and per server maxconn. Unless all of them are set, you will hit those default 2000 maxconn.

Also enable the stats interface and checkout all the values you see there.

With keepalive on, I get a stats output (just one snapshot; might not be entirely typical) of this:

show info
Name: HAProxy
Version: 1.9.4
Release_date: 2019/02/06
Nbthread: 32
Nbproc: 1
Process_num: 1
Pid: 312045
Uptime: 0d 0h00m57s
Uptime_sec: 57
Memmax_MB: 0
PoolAlloc_MB: 142
PoolUsed_MB: 142
PoolFailed: 0
Ulimit-n: 1000032
Maxsock: 1000032
Maxconn: 500000
Hard_maxconn: 500000
CurrConns: 6405
CumConns: 28831
CumReq: 11882982
MaxSslConns: 0
CurrSslConns: 0
CumSslConns: 0
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 542
ConnRateLimit: 0
MaxConnRate: 3116
SessRate: 542
SessRateLimit: 0
MaxSessRate: 3116
SslRate: 0
SslRateLimit: 0
MaxSslRate: 0
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 0
SslFrontendSessionReuse_pct: 0
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 0
SslCacheLookups: 0
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
Tasks: 6443
Run_queue: 3270
Idle_pct: 1
node: JACentos7p6-19W06
Stopping: 0
Jobs: 6409
Unstoppable Jobs: 0
Listeners: 3
ActivePeers: 0
ConnectedPeers: 0
DroppedLogs: 0
BusyPolling: 0

With keepalive off, I get this:

show info
Name: HAProxy
Version: 1.9.4
Release_date: 2019/02/06
Nbthread: 32
Nbproc: 1
Process_num: 1
Pid: 313930
Uptime: 0d 0h00m10s
Uptime_sec: 10
Memmax_MB: 0
PoolAlloc_MB: 41
PoolUsed_MB: 41
PoolFailed: 0
Ulimit-n: 1000032
Maxsock: 1000032
Maxconn: 500000
Hard_maxconn: 500000
CurrConns: 210
CumConns: 189772
CumReq: 189772
MaxSslConns: 0
CurrSslConns: 0
CumSslConns: 0
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 25429
ConnRateLimit: 0
MaxConnRate: 25049
SessRate: 25429
SessRateLimit: 0
MaxSessRate: 25049
SslRate: 0
SslRateLimit: 0
MaxSslRate: 0
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 0
SslFrontendSessionReuse_pct: 0
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 0
SslCacheLookups: 0
SslCacheMisses: 0
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
Tasks: 248
Run_queue: 298
Idle_pct: 7
node: JACentos7p6-19W06
Stopping: 0
Jobs: 214
Unstoppable Jobs: 0
Listeners: 3
ActivePeers: 0
ConnectedPeers: 0
DroppedLogs: 0
BusyPolling: 0

No, I mean the stats web-interface:

https://cbonte.github.io/haproxy-dconv/1.8/configuration.html#stats%20uri

It will provide all kinds of informations including queue length, current maxconn settings, etc.

Ok, I’ve enabled that, though I’m not certain what I’m looking for.

Here’s the csv out for keepalive on:

# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,wrew,connect,reuse,cache_lookups,cache_hits,
myfrontend,FRONTEND,,,6357,6384,200000,10117,159224535,555025172,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,345,0,3034,,,,0,2270871,0,0,0,3760,,217619,228917,2279796,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,http,,345,3034,10117,2,0,0,0,,,0,0,
mybackend,myserver,0,0,5147,5981,100000,2279791,159224590,555011428,,0,,0,0,0,0,no check,1,1,0,,,13,,,1,3,1,,2279799,,2,217614,,228918,,,,0,2270883,0,0,0,0,,,,,0,0,,,,,0,,,0,0,11,16,,,,,,,,,,,,,,http,,,,,,,,0,2279799,0,,,
mybackend,BACKEND,0,0,5161,5991,20000,2279801,159224800,555012160,0,0,,0,0,0,0,UP,1,1,0,,0,13,0,,1,3,0,,2279801,,1,217625,,228917,,,,0,2270880,0,0,0,3760,,,,2274640,0,0,0,0,0,0,0,,,0,0,11,16,,,,,,,,,,,,,,http,,,,,,,,0,2279800,0,0,0,

And the csv out for keepalive off:

# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,wrew,connect,reuse,cache_lookups,cache_hits,
myfrontend,FRONTEND,,,81,432,200000,387123,34447173,94453188,0,0,0,,,,,OPEN,,,,,,,,,1,2,0,,,,0,24907,0,25556,,,,0,387018,0,0,0,25,,24907,25568,387072,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,http,,24907,25556,387123,2,0,0,0,,,0,0,
mybackend,myserver,0,0,28,273,100000,387070,34446738,94438248,,0,,0,0,0,0,no check,1,1,0,,,18,,,1,3,1,,387070,,2,24906,,25568,,,,0,387017,0,0,0,0,,,,,0,0,,,,,0,,,0,0,5,5,,,,,,,,,,,,,,http,,,,,,,,0,387070,0,,,
mybackend,BACKEND,0,0,28,274,20000,387070,34446738,94438248,0,0,,0,0,0,0,UP,1,1,0,,0,18,0,,1,3,0,,387070,,1,24906,,25568,,,,0,387017,0,0,0,25,,,,387042,0,0,0,0,0,0,0,,,0,0,5,5,,,,,,,,,,,,,,http,,,,,,,,0,387070,0,0,0,

I’ll look at this, though I’m not sure yet what I’m looking for.

I really appreciate the help. Based on this, do you have any more insights?

If my goal is ultimately to measure the number of SSL (TLS) handshakes that my hardware can do, with HAProxy, is there an alternative approach to enforcing that each request is treated as a unique client with its own handshake? Maybe I can maintain sessions (aka connections?) as HAProxy sees them while still redoing handshakes?

I think your benchmark client (as well as haproxy on the backend side) runs out of source ports. I mean you seem to be capped at 25k connections per second.

If your port 80 backend listens on all IPs, duplicate your backends and use different source IP’s and port ranges so you have more source ports available:

backend mybackend
 balance rr
 server myserver1 127.0.0.1:80 source 127.0.0.3:1025-63000 maxconn 100000
 server myserver2 127.0.0.2:80 source 127.0.0.3:1025-63000 maxconn 100000
 server myserver3 127.0.0.3:80 source 127.0.0.3:1025-63000 maxconn 100000
 server myserver4 127.0.0.4:80 source 127.0.0.3:1025-63000 maxconn 100000

That should help with backend source port exhaustion.

Also, duplicate your frontend configuration and run parallel benchmark threads:

frontend myfrontend
 maxconn 200000
 bind :8081
 bind :8082
 bind :8083
 bind :8084

That way you have 4 times more source ports available. But you have to run 4 benchmark instances, one for each port. If the total amount of requests and connections you can push through does not increase, than it this wasn’t it.

Also raise all your maxconn values some more.

1 Like

Sweet! Applying those concepts, I can get at least 90k requests/sec with the keepalive off, and I may be able to tune that up further. I am getting a fair number of these messages: “Connect() failed for backend mybackend: no free ports.” I’m assuming that I can figure out the best way to get ride of those.

I’m not sure, but for your 4 “source 127.0.0.3” lines, those should range from 1-4 also and not just be at 3, is that right? At least that would give me 4 times as many backend source ports?

Thanks so much!

The no free ports messages confirms that you are working the correct problem; you are still running out of source ports on haproxy.

You can just add new backend servers, as long as you are using more source-IP <-> destination IP combinations, you are adding more tuples, therefor adding more source ports to the equation.

Also remember that your benchmark client will have the same issue, that’s why you will also have to increase the frontend port numbers and benchmark client instances, otherwise you are limited on that frond.

As long as either the source IP or the destination IP is different, you have a new tuple, therefor, new source ports. So source 172.0.0.3 is fine. You can also leave the destination IP at 127.0.0.1 for everything and only change the source IP.

Actually now that I’m thinking, the best thing would actually be to not specify the source IP at all, and only specify different destination IPs, so the kernel provides free source ports. That may be more efficient:

 server myserver1 127.0.0.1:80 maxconn 100000
 server myserver2 127.0.0.2:80 maxconn 100000
 server myserver3 127.0.0.3:80 maxconn 100000
 server myserver4 127.0.0.4:80 maxconn 100000
 server myserver4 127.0.0.5:80 maxconn 100000
 server myserver4 127.0.0.6:80 maxconn 100000
 server myserver4 127.0.0.7:80 maxconn 100000
 server myserver4 127.0.0.8:80 maxconn 100000
 server myserver4 127.0.0.9:80 maxconn 100000

If you have at least Linux 4.2/libc 2.23, you can specify the source IP while not specifying the source port range, and have the kernel deal with source ports, do to IP_BIND_ADDRESS_NO_PORT being used, but that’s probably not needed, as you only need a lot of tuples.

1 Like