HAProxy performance issue when proxying to gRPC

Hi!

I have two gRPC servers and one fat gRPC client. I wrote my own client load balancer, that just simply use random server to perform request. It works perfect, but it has few disadvantages:

  • no health checks
  • no server weights (one server is more powerful than another)

I decided two put HAProxy in front of them, so the HAProxy will solve disadvantages of previous scheme and give me possibility to scale easier.

When i finish my HAProxy setup it turns out that performance through HAProxy reduced, and i get a lot of timeout errors (grpc context deadline from my golang client). Now when i connect directly to the slowest node it is more faster in rps (and there is no errors) than when i connect through HAProxy with two servers in backend.

Both client, HAProxy and two upstream servers located in the same data center with ping less than 0.5ms.

I tried to apply few Linux Kernel configs, but mostly they has no impact on performance.

I also tried to install 5.1 linux kernel, and it seems it is now little bit faster, but it still slower than direct connect to the slowest node.

I tried to use proto h2 directive to connect directly without tls, but performance is still poor.

Can somebody explain me how to figure out where is the problem and how to fix it?

Thanks in advance!

Configuration of proxy server machine:

Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz
32GB RAM

HAProxy version:

 haproxy -v
 HA-Proxy version 1.9.8-1ppa2~bionic 2019/06/13 - https://haproxy.org/

This is my config file:

global
        log stdout local0
        maxconn 50000
        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
        ssl-default-bind-options ssl-min-ver TLSv1.1
        tune.ssl.default-dh-param 2048

defaults
    log global
    mode http
	timeout connect 5s
	timeout client 30s
	timeout server 30s
	option httplog
	option logasap
	option http-use-htx
       

frontend grpc-proxy
        bind :9000  ssl crt /etc/ssl/private/grpc-balancer.pem alpn h2
        default_backend grpc-feeds

backend grpc-feeds
        balance random

        server grpc-feeds-01    1.2.3.4:9000    ssl verify none alpn h2 check
        server grpc-feeds-02    5.6.7.8:9000    ssl verify none alpn h2 check

Some benchmarks:

Direct:
Node 1: 2300rps
Node 2: 700rps

Through HAProxy: ~500rps

UPD: playing with config i found out that adding

tune.h2.max-concurrent-streams 8096
nbthread 4
cpu-map 1- 0-

increase performance, now it’s about 2000-2500rps, but i still get an stream terminated by RST_STREAM with error code: REFUSED_STREAM (rpc error: code = Unavailable desc = stream terminated by RST_STREAM with error code: REFUSED_STREAM), and i dont know how to fix this. Without HAProxy, with direct access to the node under the high load there is no such errors, it looks like HAProxy close connection in some cases.

Provide haproxy logs, they likely contain errors code so we see what particular limit or timeout we are hitting.

we have the same issue, trying to deploy haproxy in kubernetes without ssl

global
  log stdout local0
  maxconn 50000
  debug

defaults
  log global
  maxconn 3000
  mode http
  timeout connect 10s
  timeout client 30s
  timeout server 30s
  option httplog
  option logasap
  option http-use-htx

frontend fe_proxy
  bind :50051 alpn h2
  default_backend be_servers

backend be_servers
  balance roundrobin
  server server1 greetingserver.default.svc.cluster.local:50051 check maxconn 20 alpn h2



<134>Jun 23 11:05:38 haproxy[5]: 127.0.0.1:53594 [23/Jun/2019:11:05:38.544] fe_proxy be_servers/server1 0/0/0/-1/+2 -1 +0 - - SD-- 1/1/0/0/0 0/0 "POST /grpc.reflection.v1alpha.ServerReflection/ServerReflectionInfo HTTP/2.0"
00000002:fe_proxy.accept(0006)=000d from [127.0.0.1:53594] ALPN=<none>
00000002:fe_proxy.clireq[000d:ffffffff]: POST /grpc.reflection.v1alpha.ServerReflection/ServerReflectionInfo HTTP/2.0
00000002:fe_proxy.clihdr[000d:ffffffff]: content-type: application/grpc
00000002:fe_proxy.clihdr[000d:ffffffff]: user-agent: grpc-go/1.21.0
00000002:fe_proxy.clihdr[000d:ffffffff]: te: trailers
00000002:fe_proxy.clihdr[000d:ffffffff]: host: localhost:50051
00000002:be_servers.srvcls[000d:000e]
00000002:be_servers.clicls[000d:000e]
00000002:be_servers.closed[000d:000e]
<134>Jun 23 11:05:38 haproxy[5]: 127.0.0.1:53594 [23/Jun/2019:11:05:38.553] fe_proxy be_servers/server1 0/0/1/-1/+2 -1 +0 - - SD-- 1/1/0/0/0 0/0 "POST /grpc.reflection.v1alpha.ServerReflection/ServerReflectionInfo HTTP/2.0"

just trying a describe : grpcurl --plaintext localhost:50051 describe

kubectl port-forward svc/haproxy 50051:50051

---
apiVersion: v1
kind: Service
metadata:
  name: haproxy
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: haproxy
  ports:
  - protocol: TCP
    port: 50051
    targetPort: 50051

Really, you have a performance issue just like the one danforth has? Because it seems to me you have a configuration problem and it does not work at all.

If you have a performance issue just like the one in this thread, please open a new thread with the data requested in post 2.

If you have a different problem (it doesn’t work at all), than please open a new thread too.

Thanks for your reply.

This is logs: https://pastebin.com/sgeDLaTy

As i said before, i adjust tune.h2.max-concurrent-streams and it gives a huge performance improvement comparing to how it was before. But even so, performance is still worse than if i make direct connect on two nodes and randomly choose what node to query.

This is weird, because:

  • haproxy uses weights, and the slowest nodes get less requests than powerful node
  • randomly chosed node load distributed evenly, it means the slowest node must produce errors earlier

Also, there is no even single error when i connect directly. Everything looks like HAProxy drastically reduce performance and reliability, and therefore producing more errors as result.

A few questions:
How are you benchmarking this exactly?
How are you generating the load?
Can you log without logasap?
Please provide the output of haproxy -vv

What’s missing in your configuration is a maxconn configuration in your frontend, but I don’t think you are hitting that limit (at least, not according to those logs).

You can set tune.h2.max-concurrent-streams to 0 if you don’t want any limitation in the streams at all. That’s probably not a good idea though.

I have an application that performs requests to gRPC server. I run this app providing HAProxy IP and direct IP of nodes.

I run the application that i described above from node from the same datacenter.

Logs under benchmark procedure:
https://pastebin.com/3Kp0PUWq

HA-Proxy version 1.9.8-1ppa2~bionic 2019/06/13 - https://haproxy.org/
Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-jeU9I_/haproxy-1.9.8=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-format-truncation -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wno-implicit-fallthrough -Wno-stringop-overflow -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_NS=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
Running on OpenSSL version : OpenSSL 1.1.1  11 Sep 2018
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.31 2018-02-12
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with multi-threading support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
          h2 : mode=HTX        side=FE|BE
          h2 : mode=HTTP       side=FE
   <default> : mode=HTX        side=FE|BE
   <default> : mode=TCP|HTTP   side=FE|BE

Available filters :
	[SPOE] spoe
	[COMP] compression
	[CACHE] cache
	[TRACE] trace

I am not too familiar with HAProxy, but there is such directive in global section.

Just to clarify, the latest config is:

global
        log stdout local0
        maxconn 100000
        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
        ssl-default-bind-options ssl-min-ver TLSv1.1
        tune.ssl.default-dh-param 2048
        tune.h2.max-concurrent-streams 8096 # 65535 # 8096
        #nbproc 4
        nbthread 4
        cpu-map 1- 0-

defaults
        log global
        mode http
        timeout connect 5s
        timeout client 30s
        timeout server 30s
        option httplog
        # option logasap
        option http-use-htx
        http-reuse always

frontend grpc-proxy
        bind :9000  ssl crt /etc/ssl/private/grpc-balancer.pem alpn h2
        default_backend grpc-feeds

backend grpc-feeds
        balance roundrobin

        server grpc-feeds-01    1.2.3.4:9000    ssl verify none alpn h2 check weight 10
        server grpc-feeds-02    5.6.7.8:9000       ssl verify none alpn h2 check weight 6

I also think that disabling limit is not good idea.

Just for example, when i writing this post, i perform one more benchmark to double check:

Benchmark directly to single node:

./grpc-client -addr ip-of-powerful-node:9000
^C2019/06/26 10:54:21 524756 requests, 524744 responses, 524744 filled, 0 empty
2019/06/26 10:54:21 1753.96 rps (4m59.176372228s)

As you can see, there is no even single error.

Benchmark through HAProxy:
There are a lot of errors, and the result is:

2019/06/26 11:00:32 unknown error: 14 - stream terminated by RST_STREAM with error code: REFUSED_STREAM (rpc error: code = Unavailable desc = stream terminated by RST_STREAM with error code: REFUSED_STREAM)
2019/06/26 11:00:32 unknown error: 14 - stream terminated by RST_STREAM with error code: REFUSED_STREAM (rpc error: code = Unavailable desc = stream terminated by RST_STREAM with error code: REFUSED_STREAM)
2019/06/26 11:00:32 unknown error: 13 - server closed the stream without sending trailers (rpc error: code = Internal desc = server closed the stream without sending trailers)
2019/06/26 11:00:32 unknown error: 14 - stream terminated by RST_STREAM with error code: REFUSED_STREAM (rpc error: code = Unavailable desc = stream terminated by RST_STREAM with error code: REFUSED_STREAM)
2019/06/26 11:00:32 unknown error: 14 - stream terminated by RST_STREAM with error code: REFUSED_STREAM (rpc error: code = Unavailable desc = stream terminated by RST_STREAM with error code: REFUSED_STREAM)
2019/06/26 11:00:32 unknown error: 14 - stream terminated by RST_STREAM with error code: REFUSED_STREAM (rpc error: code = Unavailable desc = stream terminated by RST_STREAM with error code: REFUSED_STREAM)
^C2019/06/26 11:00:36 486548 requests, 426816 responses, 426816 filled, 0 empty
2019/06/26 11:00:36 1538.42 rps (4m37.43799153s)

Also please check this image: http://i.imgur.com/SRCzJ8Y.png

There are two rows on charts, first row is first node and second row is a second node, as you can see when i benchmark the first node only the load is almost always stable, when i benchmark through HAProxy there is sawtooth chart, i think because HAProxy disable some node because she thinking it is not alive, or maybe due to all workers on client side is waiting for response.

Does 1.9.8 supports HTTP2 streaming? Maybe problem is that haproxy buffering all responses from my gRPC servers?

upd: just updated to HAProxy v2.0.0, with same config there is no difference in performance.

What I mean is, how does it behave, what does it do exactly?

Does it open a unlimited number of streams in a H2 connection? Does it open multiple H2 connections? How does it behave when a stream is refused in a H2 connection?

I know. You still need it in the frontend section then. Or put it in the default section.

Global maxconn is one thing. Frontend maxconn is another.

Still, since clearly that is a limiting factory and increasing it increases the performance, can you please try?

If your backends doesn’t limit the streams per connection, and haproxy does, its obvious that the performance will be worse. That’s why you have to actually try removing the limit.

Payload is always streamed.

Can you graph the number of H2 connections with and without haproxy?

I am not sure, because the client (and the server also) is generated code by gRPC framework. As i knows, the server is responsible for open concurrent streams, and by default it’s unlimited (not sure).

I will try to describe my application:
I am using now a single directional stream.

  1. Application perform request.
  2. Server receive request, perform some actions and start streaming data to response.
  3. It streams until deadline exceeded (1s500ms).
  4. If server respond for example in 500ms, it will close the stream so client will be notified that stream is finished.
  5. If server is too slow, and cannot stream all data in timeout (1s500ms), the client close the connection (this should be propagated to the server side so the server will stop doing work after is is notified that client does not wait for response).

Connection handling steps performed by gRPC framework at low level.

I put additional directive in default section.

I changed tune.h2.max-concurrent-streams to 0, but it looks like it is not the same as unlimited, because the application cannot perform even single request, and fails with error: 14 - the connection is draining, currently i put 100000 but it does not make performance impact. The previous value was 8192.

I am not sure how to do it, but if i will found the way how to measure h2 connections (and concurrent streams inside of it), i will post here.

so 100000 performs just as 8192 is what you are saying?

And maxconn configuration in the default section also did not improve anything?

Yes, exactly. The last config global log stdout local0 maxconn 100000 ssl-default-b - Pastebin.com
I still get errors and about 1500rps

Willy responded in the bug:

Related issue that confirmed as a bug: https://github.com/haproxy/haproxy/issues/172