Is there a way to limit the size of a backend queue?

Hello!

I have a custom piece of code that does http proxying for me. It has the following logic: every server processes no more than one request at a time. All the servers are equal. When all of the servers are busy, incoming requests are queued up. The first server to finish its work receives the next request from the queue. When X requests are already queued up, respond immediately with 503 error. The idea with limiting the size of the queue is to predict whether current POD will be able to process an incoming request in a reasonable amount of time.

The task is to make a signle queue for incoming service requests since it works much better than individual queues for servers. Requests sizes range from 1kB to 10MB. Requests processing usually takes from 2-3ms to 1s. The protocol is GRPC over http/2 over cleartext tcp/ip.

I’m trying to replicate custom code behavior with HAProxy. I could not find an option to limit the maximum queue size. Is there a way to do that?

Right now I’m using the following config. It uses an intermediate backend to limit the maximum queue size, but I am not happy with an additional hop I had to introduce.

global
    nbthread 5
    tune.bufsize 524288
    tune.h2.initial-window-size 524288

defaults
    mode http
    log     global
    log /tmp/unified-agent.sock len 65535 format raw daemon debug
    log-format '{"t":"%t","HM":"%HM","HU":"%{json(utf8s)}HU","HV":"%HV","ST":%ST,"B":%B,"H":"%{json(utf8s)}H","Ta":%Ta,"Tc":%Tc,"Td":%Td,"Th":%Th,"Ti":%Ti,"Tq":%Tq,"TR":%TR,"Tr":%Tr,"Ts":%Ts,"Tt":%Tt,"Tu":%Tu,"Tw":%Tw,"U":%U,"ac":%ac,"b":"%b","bc":%bc,"bi":"%bi","bp":"%bp","bq":%bq,"ci":"%ci","cp":%cp,"f":"%f","fc":%fc,"fi":"%fi","fp":%fp,"ft":"%ft","lc":%lc,"ms":"%ms","pid":%pid,"rc":%rc,"rt":%rt,"s":"%s","sc":%sc,"si":"%si","sp":"%sp","sq":%sq,"tr":"%tr","ts":"%ts"}'

    option redispatch
    timeout connect 3ms
    timeout http-request 10s
    timeout http-keep-alive 1h
    timeout tunnel 1h
    timeout queue 10s
    timeout client 10s
    timeout client-fin 10s
    timeout server 10s
    timeout server-fin 10s

    errorfile 503 fast-error.http # Respond with 200 OK since GRPC does not allow other codes


frontend renderer_frontend
    bind :::86 proto h2
    default_backend renderer_queue_backend
    maxconn 10000

backend renderer_queue_backend
    http-reuse always
    retries 0
    timeout queue 1us # 0 means default, in which case timeout connect value will be used
    server queue_limit_single localhost:9001 proto h2 maxconn 10 # No more than 5 requests in progress and 5 queued requests

frontend renderer_queue_frontend
    bind :::9001 proto h2
    default_backend renderer_workers
    maxconn 10000

backend renderer_workers
    balance leastconn
    http-reuse always
    retries 5

    server worker_renderer_0 localhost:2002 proto h2 maxconn 1 check inter 1s fall 1 rise 1 observe layer4 error-limit 1
    server worker_renderer_1 localhost:2004 proto h2 maxconn 1 check inter 1s fall 1 rise 1 observe layer4 error-limit 1
    server worker_renderer_2 localhost:2006 proto h2 maxconn 1 check inter 1s fall 1 rise 1 observe layer4 error-limit 1
    server worker_renderer_3 localhost:2008 proto h2 maxconn 1 check inter 1s fall 1 rise 1 observe layer4 error-limit 1
    server worker_renderer_4 localhost:2010 proto h2 maxconn 1 check inter 1s fall 1 rise 1 observe layer4 error-limit 1

I’m using haproxy 3.0.4 and I can change it to any version I need.

$ haproxy -v
HAProxy version 3.0.4-1ppa1~jammy 2024/09/03 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2029.
Known bugs: http://www.haproxy.org/bugs/bugs-3.0.4.html
Running on: Linux 5.15.160-9.2