Haproxy performance with grpc in responding

I am doing measurement of grpc by sending around 640MB data from grpc server. when my grpc client directly querying for data to grpc i am getting these 640MB within 700 ms. But when I am passing the request over HAProxy its turning out to be 1400 ms

From the haproxy logs it seems that Haproxy is getting request with grpc encoding as grpc . But in my code i have not set any encoding.

If I directly send request from grpc client to grpc server I can see this encoding is not present in http header.

HAProxy logs ->>>>>>>>>>

00000005:f_h2c.clireq[001e:ffffffff]: POST http://com.mycompany/com.mycompany.myproto.EchoService/echo HTTP/2.0
00000005:f_h2c.clihdr[001e:ffffffff]: content-type: application/grpc
00000005:f_h2c.clihdr[001e:ffffffff]: te: trailers
00000005:f_h2c.clihdr[001e:ffffffff]: user-agent: grpc-java-netty/1.19.0
00000005:f_h2c.clihdr[001e:ffffffff]: grpc-accept-encoding: gzip
00000005:f_h2c.clihdr[001e:ffffffff]: grpc-timeout: 4706532u
00000005:f_h2c.clihdr[001e:ffffffff]: host: com.mycompany
00000005:local_node.srvrep[001e:001f]: HTTP/2.0 200
00000005:local_node.srvhdr[001e:001f]: content-type: application/grpc
00000005:local_node.srvhdr[001e:001f]: grpc-encoding: identity
00000005:local_node.srvhdr[001e:001f]: grpc-accept-encoding: gzip
00000005:local_node.srvcls[001e:001f]

Is there any way to disable this gzip for Grpc in HaProxy side.

here is the config I am using

defaults
log global
timeout connect 10s
timeout client 30s
timeout server 30s
option logasap
option http-use-htx

frontend f_h2c
mode http
option http-use-htx
bind *:9211 proto h2
default_backend local_node

backend local_node
mode http
option forwardfor
default-server inter 1s fall 1
server server1 10.204.11.33:8086 check proto h2
server server2 10.204.11.33:8085 check backup proto h2

I don’t see you doing any application layer stuff, why not simple remove proto h2 and mode http everywhere and use mode tcp instead.

That way, haproxy will not intervene for sure in the application (just forwarding TCP payload).

I don’t think haproxy add’s this header because it does not know it. Also, from the haproxy logs you provided it appears that your client is sending this header.

That does not sound like a haproxy issue. Whats you design here? A single haproxy box with one 10Gbit/s NIC?

Thanks for your reply

for your first point as i am using grpc thats why I am using h2 to enable Http/2 protocol. is it ok if I simply go with tcp

For your second point
If same client send the request to server directly i dont see this header. So not sure why client is adding in case of haproxy. I need to see it through wireshark to confirm

Third point

I am testing it on same local system. So network is not in the picture yet. I just wanted to see whether HaProxy will add any lag in the throughput

I’m not sure I understand the second part: are you asking whether it is ok to use tcp mode or are you saying you don’t have this performance problem when using tcp mode?

Ok, so this is through loopback where no hard performance limit exists; however that doesn’t mean to transferring data from and to loopback comes at zero cost.

And when you test client -> server vs client -> proxy -> server your resource consumption at least doubles.

I assume you are CPU bound here.

TCP mode is not helping me.

For the last point , I can understand there will be some cost as responses are going via HaProxy. But my point is the differences is coming very huge

Do you suggest any optimization in haProxy side

No, what you are benchmarking confirms that haproxy is behaving as expected.

X amount of work (transmitting and receiving 640 MB) = Y amount of time (700 ms)
X * 2 amount of work (transmitting and receiving 640 MB + transmitting and receiving 640 MB) = Y * amount of time (1400 ms)

TCP mode behaving the same confirms there is nothing wrong.

If you want to hide the cost of copying in the userspace, use mode tcp and add option splice-auto to the default section.

Not that I believe you would actually need it, because your benchmark is completely artificial and probably has nothing to do with the real production use-case.

I have change the tune.bufsize value. and with that at least now i can see in average its taking around 900ms to 1000 ms

Do you see any other tcp parameter i can tune to get better performance

Please don’t do this.

I explained above why you see what you are seeing and also how you can remedy it (even though it does not make sense). By tweaking values such as bufsize you are changing other things which will have other impacts.