HAProxy community

Intermittent "SD" termination state

Hello HAProxy Experts!

we are intermittently finding requests with a termination state of “SD” in our HAProxy logs. Most requests have zero bytes responded but some show some random number of bytes before it reports the “SD” status. We are seeing this status at a rate of about 1 every 1000 requests.
We can attribute some of them to backend servers being ungracefully restarted in our Kubernetes cluster. However about 10% of those cannot be related to any event. We have already done extensive research in our network but everything looks ok.

We are running HAProxy 1.9.9 on the “client” side and HAProxy 1.9.10 on the “server” side. Both environments are on CentOS 7.

Some further search turned up this old HAProxy bug https://www.mail-archive.com/haproxy@formilux.org/msg26282.html and we are wondering whether this could be a regression that crept back into version 1.9 from 1.7.

Thank you,
-Ulrich

Have you found anything? We are experiencing something similar…

Just a theory but not really validated.
We had the option nolinger enabled on most HAProxys which will send a RST after HAProxy responded with the last byte. We are running with a redundant network and all our machines are multi-homed (multiple NICs). Our theory is that the RST got sent over a different network path and intermittently was received before the last byte or even before any byte in case the payload was smaller than our MTU. In this case, the remaining data in transit may have gotten “lost” as described here https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_termination

In any case, once we removed the nolinger setting we no longer saw those SDs. We didn’t need the nolinger anyways since it was some remaining “tech debt” from an earlier “experiment” that a previous employee who since then left forgot to remove.

By the way, we are moving away from HAProxy and are in the process of migrating to Envoy. We had so many ongoing issues with HAProxy in our highly dynamic Kubernetes clusters that we got tired of chasing them and “plugging one hole after another”. We have already migrated a fair number of services to use Envoy and it is “smooth sailing” since then.