Hi, we are using haproxy to load balance a fairly big service, approx 150k active websocket connections at a time. This is all routed through the one load balancer running haproxy, to one of 3 nodes based on the path (sharded based on user ID)
We do haproxy reload
to update this sharding map. However, this has been inconsistent. Sometimes it would cause all WS connections to drop and then they’d all reconnect, rather than gracefully starting a new process and leaving the old haproxy process running.
We tracked it down to the watchdog killing the old haproxy process because it took too long releasing memory during the reload. When we set the no-memory-trimming
flag, it works perfectly.
However, our users keep their websockets open for a long time. It can take weeks or months for every single connection on the old process to disconnect, allowing the old process to be cleaned up. Forcibly disconnecting the websockets is an absolute last resort, so we would like to avoid this.
This isn’t fundamentally an issue, we’re ok with having many processes running. However we’re surprised to see that the memory usage of those processes doesn’t decrease over time as the connections drain. A process with just a few connections is still using the same amount of RAM as one with 100k websockets actively connected.
From the docs, it says:
no-memory-trimming
: Disables memory trimming (“malloc_trim”) at a few moments where attempts are made to reclaim lots of memory (on memory shortage or on reload).
This made me think that no-memory-trimming
would only disable memory trimming in those specific times, and allow memory to be slowly reclaimed over time when not needed.
Do you have any suggestions for how we can get these old processes to release their held memory when it’s not needed any more?
Edit: We haven’t actually ran out of memory yet. I could be entirely wrong and maybe haproxy will actually reduce memory usage once we get closer to running out of free RAM. If that’s the case, great.