Weird state with peers not reloading

I have a cluster of load balancers, with a peers section that looks like:

peers  lbs

and in one of the backends a stick table is defined like:

stick-table type string len 36 size 10k expire 6m peers lbs srvkey addr

Usually this works fine. But I recently ran into an issue where it didn’t seem to be respecting the stick table across the peers. While troubleshooting I tried using the show peers command on the stats socket CLI, and it returned back an empty string. If I tried show peers lbs it told be that peer group didn’t exist. I will henceforth refer to this state as the “bad state”.

I tried doing a reload, both by sending SIGUSR2 to the master process (in master-worker mode), and by sending the reload command to the master over the master cli. But that didn’t change anything. However, if I completely stopped and then restarted the haproxy process (master and worker), then it gets back into a healthy state with the peers listed and stick tables working. And then once in this state, if i make changes to the peers section and reload, those changes apply.

I’ve tried to reproduce this with a more minimal configuration, but so far have been successful. In fact, even with my full configuration I can’t get it into this state again. But it has happened on multiple servers for me, and I’m worried it might happen again.

Anyone have any ideas on what might have caused this or how to prevent it from happening again?

I don’t entirely understand what was happening here, but I did discover that the script that was starting haproxy was initially using the wrong ip (which we also used as the peername) in the -L option when it started. It was using the ip from the host that the image for the VM was created from. After startup, the script was updated to use the correct ip, but there was a race condition where that didn’t necessarily happen before haproxy was started.

So I think this might happen if you use the -L option with a peer that doesn’t exist.