Hi!
I have two gRPC servers and one fat gRPC client. I wrote my own client load balancer, that just simply use random server to perform request. It works perfect, but it has few disadvantages:
- no health checks
- no server weights (one server is more powerful than another)
I decided two put HAProxy in front of them, so the HAProxy will solve disadvantages of previous scheme and give me possibility to scale easier.
When i finish my HAProxy setup it turns out that performance through HAProxy reduced, and i get a lot of timeout errors (grpc context deadline from my golang client). Now when i connect directly to the slowest node it is more faster in rps (and there is no errors) than when i connect through HAProxy with two servers in backend.
Both client, HAProxy and two upstream servers located in the same data center with ping less than 0.5ms.
I tried to apply few Linux Kernel configs, but mostly they has no impact on performance.
I also tried to install 5.1 linux kernel, and it seems it is now little bit faster, but it still slower than direct connect to the slowest node.
I tried to use proto h2
directive to connect directly without tls, but performance is still poor.
Can somebody explain me how to figure out where is the problem and how to fix it?
Thanks in advance!
Configuration of proxy server machine:
Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz
32GB RAM
HAProxy version:
haproxy -v
HA-Proxy version 1.9.8-1ppa2~bionic 2019/06/13 - https://haproxy.org/
This is my config file:
global
log stdout local0
maxconn 50000
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
ssl-default-bind-options ssl-min-ver TLSv1.1
tune.ssl.default-dh-param 2048
defaults
log global
mode http
timeout connect 5s
timeout client 30s
timeout server 30s
option httplog
option logasap
option http-use-htx
frontend grpc-proxy
bind :9000 ssl crt /etc/ssl/private/grpc-balancer.pem alpn h2
default_backend grpc-feeds
backend grpc-feeds
balance random
server grpc-feeds-01 1.2.3.4:9000 ssl verify none alpn h2 check
server grpc-feeds-02 5.6.7.8:9000 ssl verify none alpn h2 check
Some benchmarks:
Direct:
Node 1: 2300rps
Node 2: 700rps
Through HAProxy: ~500rps
UPD: playing with config i found out that adding
tune.h2.max-concurrent-streams 8096
nbthread 4
cpu-map 1- 0-
increase performance, now it’s about 2000-2500rps, but i still get an stream terminated by RST_STREAM with error code: REFUSED_STREAM (rpc error: code = Unavailable desc = stream terminated by RST_STREAM with error code: REFUSED_STREAM)
, and i dont know how to fix this. Without HAProxy, with direct access to the node under the high load there is no such errors, it looks like HAProxy close connection in some cases.