We are trying to setup load balanced servers so we can spread the load over many servers as we grow our app. Our software is being created in Go listening as a web server.
We created a simple server in Go using PING/PONG to see how many requests we can handle per server at max. Yes, we understand that when you add in database access and all of the other processing code, that single server will not reach the same number of transactions per second. This is merely test code to ensure consistent speed on each box to exclude any outside latency sources.
We have a 2U physical box with 64 GB ram and (2) Xeon 4110 8 Core 16 Thread processors. Our Internet connection is 1 GB fiber and all servers are connected internally on virtual lan so latency shouldnt be any issue.
We have (3) Go servers setup, each with 4 GB ram and 4 CPU in a VM using Cent OS 7.
We can achieve 47,000 queries per second on each box. We would assume that if we had HAProxy in front of the boxes we should be able to hit approx 140,000 qps over the 3 servers combined.
We are using the TCP method of routing in HAProxy as it appears to be much faster than the HTTP method. We do not need to route based on URL so TCP seems to be better at the moment.
The 47,000 qps was tested using loader.io with 2000 clients spanning to 4000 clients over a 1 minute period. Each server processes the same amount of qps on average.
We setup HAProxy to connect to 1 server only to see what the speed was having it in the middle. We say about 46,000 qps when hitting the one server.
We added 2 more servers for a total of 3 behind HAProxy and there was no increase in qps, however there was load on all 3 servers spread out as indicated from watching htop on all machines. Total of 46,000 qps was all it would reach.
The HAProxy server was setup with 8 CPU and 16 GB ram and was NOT maxing out on CPU when watching htop.
There is 1 external IP address coming into HAProxy and the backends are linked via internal 10.x.x.x IP addresses, 1 per box. Each back end server also has an external IP address we used to test the speed of each server individually to ensure they all worked at 47,000~ qps.
We did increase loader.io to run 2000 thru 8000 clients per second aimed at the HAProxy server to throw more load at it, however it didn’t increase the qps at all.
It appears we have enough power to process the simple ping/pong requests in CPU, RAM and Internet traffic.
Is there a max limit that HAProxy can process per external IP based on port exhaustion? We did increase the ports on the server to 1024 - 65000.
Using watch ss -s doesn’t have any more than 10,000 ports used at maximum and there are very few in any wait status as we have set the server to reuse tcp connections as well as reduced the fin timeout.
We are simply trying to have a front end web proxy able to handle a lot of traffic and pass it off to the back end servers to process.
Our Go code currently runs at about 10,000 qps per VM so if we wanted to achieve the 140,000 qps in the above example, we would need 14 VMs to handle it. The goal is to have the capability for allot more than 140,000 and simply add more servers on the back end to handle the increase in load.