Using HAProxy to avoid flask app from crashing due to memory issues

I have a Python/flask app that listens to requests from a Grafana dashboard - and processes/parses data from a backend in response:

If a user ‘spams’ requests to the app via Grafana, it can currently result in too many parallel threads being spun up, thus crashing the app due to memory issue (in particular on memory restricted instances such as AWS EC2).

A simplistic solution could be to limit the number of threads or sessions entirely via e.g. waitress. However, some of the session requests are ‘light’ and purely for fetching meta data used in generating other requests - these we prefer to allow the app to process quickly and in parallel.

It has been proposed that we use HAProxy to somehow be able to “restrict” the threads/sessions sent to the app so that it does not cause out-of-memory events. I was hoping somebody in here might have a suggestion for how to set this up in our project - or would be interested in helping with this as a small freelance project.

There are many ways in which one could limit the number of concurrent sessions (or requests / second), ranging from simple solutions, to more complex ones.

As you’ve mentioned, the simplest solution would be to limit the number of sessions for the given backend, regardless of request. However as you’ve mentioned this would starve lightweight requests.

Therefore, an evolution of the above solution, while still keeping things simple, would be to define second backend that uses the same Flask target. The first backend would be for “lightweight” requests, with a larger limit, meanwhile the second backend would be for these “heavy jobs”. Then you can choose between these two in the frontend based on ACL’s.

There are other solutions, like for example using counters, that could impose limits per IP, session, etc., but these are not as straight forward.

maxconn on the backend server configuration in haproxy is the proper way to configure limits and avoid too much concurrency in the backend application.

I agree different backend section could be used in haproxy to differentiate heavy and light requests.