I have a cluster of load balancers, with a peers section that looks like:
peers lbs
peer 10.0.0.1:29000
peer 10.0.0.2:29000
peer 10.0.0.3:29000
and in one of the backends a stick table is defined like:
stick-table type string len 36 size 10k expire 6m peers lbs srvkey addr
Usually this works fine. But I recently ran into an issue where it didn’t seem to be respecting the stick table across the peers. While troubleshooting I tried using the show peers
command on the stats socket CLI, and it returned back an empty string. If I tried show peers lbs
it told be that peer group didn’t exist. I will henceforth refer to this state as the “bad state”.
I tried doing a reload, both by sending SIGUSR2 to the master process (in master-worker mode), and by sending the reload
command to the master over the master cli. But that didn’t change anything. However, if I completely stopped and then restarted the haproxy process (master and worker), then it gets back into a healthy state with the peers listed and stick tables working. And then once in this state, if i make changes to the peers section and reload, those changes apply.
I’ve tried to reproduce this with a more minimal configuration, but so far have been successful. In fact, even with my full configuration I can’t get it into this state again. But it has happened on multiple servers for me, and I’m worried it might happen again.
Anyone have any ideas on what might have caused this or how to prevent it from happening again?