With reference to this post [1] and with my own experience of the exact same thing in the past, i’m wondering if you have any suggestions or best practices how to deal with it.
The problem, in summary is that, in a dynamic environment where there are many deploys happening every hour from a bunch of services in a microservices platform, HAProxy configurations needed to be reloaded quite often. Usually there are many REST based services that communicate via the HTTP protocol, but at times (as in my case), there are also a couple of services that require long-lived TCP connections. In such a case, during reloads, you end up with more than the usual number of HAProxy processes running on the host because there are still active connections to those long lived TCP connections (as they might be sending constant heartbeats). I understand this is by design if i’m not mistaken.
As a result, the old process is left with an outdated version of backend configurations. For some reason, due to this state, we’ve observed HTTP 503s being returned from HAProxy.
I’d basically like your thoughts and recommendations around this, in particular with respect to the newest version (1.8) of HAProxy and if we can expect any side-effects or is this a thing of the past and it would be safe to run HAProxy with a mix of HTTP and long-lived TCP connections in a production environment? Or would it be better to run two HAProxy instances, one dedicated to just HTTP connections and the other for just TCP connections so atleast the HTTP services won’t be affected.
The amount of time that a obsolete process is around to serve the active sessions can be limited with the hard-stop-after directive, after which haproxy will kill those sessions so the obsolete process can close.
But there is now way to “move” existing and in-use TCP sessions from one process to another, and I don’t see how there could ever be such a feature.
Usually the long-lived TCP connections are there for a reason, and people do not want to kill them; for the rest, there is the hard-stop-after feature.
Yes, if you have to reload very often I can certainly see how it would become inconvenient or even impractical (due to memory constraints) to do this when you have long-lived TCP connections.
However there has been done a lot work to actually remove the requirement to reload haproxy often in the first place. Servers can be preprovisioned, and configured/enabled via admin socket. Haproxy can even discover servers from DNS. So the requirement to reload often is removed from the equation.
If you have a statically configured pool of backend TCP servers with long-running connections and you have to deploy new servers very often, but for whatever reason cannot use neither DNS discovery nor the admin socket for run-time configuration, then IPVS certainly makes more sense, especially if you don’t need any higher logic (like content inspection).
We’re actually running a mix of DNS backend configurations and server IP:PORT based configurations in the same config. The case where we use server IP:PORT are for use-cases where we don’t want to allow traffic cross-zone (as in AWS zones) to avoid additional costs, so we use the concept of “backup” servers for servers that are not in the current zone where HAProxy instance is running. (I guess we could do better by having DNS per zone as well).
But we don’t have static configs at all, our environment is pretty dynamic and we try to build our infrastructure assuming nothing is static.
Thanks for letting me know about hard-stop-after directive. I will look into it and see how this works in our test environments and will certainly get back to you if things don’t go as we expect.
I use haproxy 2.2.5 with corosync + pacemaker. I must unfortunately use a frequently updated CRL list to check client certificates, so haproxy is needed to be reloaded every 3 hours. When reloading haproxy using the crm shell, long TCP connections are mostly broken down (sometimes yes, sometimes no). I curious if someone successfully sorted this problem out…
“But there is now way to “move” existing and in-use TCP sessions from one process to another”
——I don’t konw how to do it, need I monify config file or start command? best wish.