GRPC Keepalive timeout

Hi,

we’ve recently found out that timeout client and timeout server settings timeout our grpc streaming application session even if it sends grpc keepalive every 10 seconds while the timeouts on haproxy are set to 30 seconds. Based on our understanding of GRPC the keepalive itself uses HTTP/2 PING which as it looks like based on behavior haproxy just consumes and doesn’t forward to client / server.

Is this observation correct? Is it possible to configure haproxy to forward http/2 pings to client / server and make it extend timeout timers? If not do you have any other recommendation how to configure haproxy so it doesn’t timeout our streaming session?

Edit: we are using haproxy 2.1.7, we’ve confirmed that the application sends the pings with tcpdump. The application also doesn’t timeout as long as it sends some grpc data back and forth, just when it is silent sending just grpc keepalives. Everything works correctly the only problem is that after 30 seconds the connection is closed by haproxy as long as the only data that was transferred over it during those 30 seconds was the grpc keepalives (HTTP/2 pings).

Thank you

In H2 sessions will just use timeout client (as opposed to timeout http-keep-alive that we only use for 1.x).

HTTP/2 PING is not associated to any stream or transaction, it’s at connection level, so I don’t think we can forward those pings end-to-end at all.

Resetting timeouts may be possible but needs to be evaluated carefully, such we are not opening ourselfs up to a DoS.

@willy we probably need your input here:

  • could a H2 PING refresh connection timeouts, without opening ourself up to a DoS?
  • does GRPC use timeout tunnel and if not, would that be a good idea?

Thanks

Hi Lukas,

In H2 sessions will just use timeout client (as opposed to timeout http-keep-alive that we only use for 1.x).

HTTP/2 PING is not associated to any stream or transaction, it’s at
connection level, so I don’t think we can forward those pings end-to-end at
all.

Indeed, PING is purely hop-by-hop and must not be forwarded (to whom anyway ?).

Resetting timeouts may be possible but needs to be evaluated carefully, such
we are not opening ourselfs up to a DoS.

@willy we probably need your input here:

  • could a H2 PING refresh connection timeouts, without opening ourself up to a DoS?

Actually I checked the code, and PING properly refreshes the connection’s
timeout.

  • does GRPC use timeout tunnel and if not, would that be a good idea?

No, because there’s really no such thing as GRPC at the protocol level.
It’s just a regular H2 request that’s passed to a server and that the
server responds to.

However what I’m suspecting from the description is that the keep-alives
are sent while the server does not respond to a request. If so, even
the PINGs are useless here because what is timing out is not the connection
but the stream, which doesn’t get any response.

If my assumption is correct, then the best way to address it would instead
be to allow to dynamically update the response timeouts based on some URL
or U-A matching. We regularly speak about having a “set-timeout”
http-request action to do that but given that there’s always another
solution it ends up never being implemented. A variant of this might be
to have an action to indicate that a request is expected to be used for
long polling and be subject to “timeout tunnel”. This would allow to keep
the timeouts in the defaults section.

With this said, I don’t even know if doing long polling on GRPC is
considered as being part of the best practices, but certainly doing
some PINGs on the connection will have no effect on the streams and
doesn’t even indicate that the server at the other end is still alive.

Willy

Thank you very much for your responses.

HAProxy actually replies to the H2 PING, it just times out the session after timeout client or timeout server even if those h2 pings are sent regularly from server (backend) or client. Even if we disabled timeout server and kept only timeout client and vice versa. The only time it didn’t time out was if both server and client timeout was disabled.

GRPC keepalives are builtin mechanism in grpc and we kind of expected that it will keep the session alive as long as they are sent.

We would like to keep the timeouts in place still timing out any sessions that do not send anything, we were just looking for a way to keep the session alive as long as H2 PINGS are sent. But if I just understood it correctly based on your answer the H2 PINGs are not session based and therefore cannot be used to keep the session alive.

We do have backup plan to send our custom GRPC dummy messages which actually do keep the session alive, but still wanted to check with haproxy community if there is some other way to utilize the built-in grpc keepalive mechanism.

Thank you once again for your responses and your time.

Just to be clear, there’s no such thing as a “session” in H2. There is a connection and there are streams. Each request-response is made in the context of a stream. These streams are independent entities on top of a connection. PING doesn’t reference any stream and is only connection-aware. What I’m pretty sure happens in your case is just that while PING maintains the connection alive, nothing maintains the streams alive if a server does not respond within the timeout.

In haproxy, the connection timeouts are only effective when there is no stream. So in your case, the PING will have no effect on the H2 connection while a request is still active, its connection will not timeout. However, the H2 PING that GRPC uses are pretty effective against a timeout on an intermediary NAT router or firewall on the path. It can also be effective to maintain an idle connection established.

I’m pretty sure that in your case it would be more convenient to be able to identify such long requests and switch them to a long timeout.

By the way, if your client and server only use long polling, do you have any objection against explicitly setting long client and server timeouts ?

1 Like

Sorry for my loose use of word session. I meant stream. The reason to not set long timeouts is that this is publicly accessible service. We do not want to open ourselves to simple DOS attack by someone by just opening thousands of connections without even having to understand whats on the other side. This is haproxy load balancer serving many other services and we would like to be rather safe than sorry.

OK, that makes sense. But as you now figured, the H2 PING being unrelated to the stream, it will be useless.

Also, and unless your long polling service is authenticated, you’re already opening it to a trivial DoS if anyone can start a long polling request there.

Actually one solution to your problem is to set a pair of long client+server timeouts and only use short http-keep-alive and http-request timeouts. These will make sure that you don’t stay open for too long before someone sends a complete request.

Thank you very much for your help and your time. We now understand it much better. I’m going to mark the answer explaining that H2 PING doesn’t reference stream as solution as that is primary reason why it cannot extend timers of stream.