Source IP from haproxy > server wrong

I have a small Haproxy server set up with 2 NICs. The OS is CentOS7 and I have configured both NICs on the same subnet per CentOS documentation.

— 192.168.0.1, 192.168.0.2 ---- both on 192.168.0.0/24
192.168.0.1 is used for management and the web gui, 192.168.0.2 is used for the LB traffic.

Traffic coming into the LB hits the 192.168.0.2 address, but seems to be egressing to the 192.168.0.1 address to the backend servers. I’ve tried specifying “source IP” in the config to no avail.

Version is 1.8.16.

If I do the telnet from the LB CLI inside Linux, the routing seems to work correctly, so it seems the LB application itself isn’t going out on the correct interface/IP.

It’s a bad idea to configure both NICs in the same subnet, because the egress NIC and source IP are not deterministically chosen by the kernel in this case.

Your expectation that 192.168.0.2 is always used for to egress traffic is something that your OS/kernel doesn’t know about.

Now specifying the source IP in haproxy should theoretically alleviate that. Can you provide the actual, complete haproxy configuration you used for this? Also provide the output of running haproxy through strace -tt -p<PID> while using the source IP option (and running traffic through it).

I strongly suggest do rethink your configuration though, and put your management IP into a different subnet.

The configuration is indeterministic, so the behavior of the sockets is basically random.

Haproxy config:

global
daemon
stats socket /var/run/haproxy.sock mode 777 level admin
maxconn 4096
maxcompcpuusage 100
maxcomprate 0
nbproc 1
ssl-server-verify required
defaults
mode tcp
option http-server-close
option redispatch
retries 3
timeout connect 5000
timeout server 50000
timeout client 50000
timeout check 50000
timeout http-keep-alive 50000
timeout http-request 50000

listen LBPROXY

bind 192.168.0.2:999
balance source
maxconn 9999
mode tcp
source 192.168.0.2
option http-server-close
server server1 192.168.0.3:999 check fall 3 rise 5 inter 2000 weight 10

I agree, except I’ve created routing tables and rules that should’ve curved this behavior. (How to connect two network interfaces on the same subnet? - Red Hat Customer Portal)

I tried running the strace - I don’t see any reference of the actual iPs or interfaces outbound being used. Is there a certain line or reference you’re looking for in that output?

Ok, but you have to realize that this has nothing to do with haproxy. Haproxy creates a socket, and uses that socket. If you configure haproxy with a specific source IP, haproxy will tell the kernel to use that source IP. All the complexity of source IP selection is in the kernel.

If you configured different routing tables, please provide the full output of those.

The strace output I have with your config is the following:

18:32:35.577932 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 14
18:32:35.578074 fcntl(14, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
18:32:35.578188 setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0
18:32:35.578326 setsockopt(14, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
18:32:35.578474 setsockopt(14, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
18:32:35.578603 bind(14, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.0.2")}, 16) = 0
18:32:35.578726 connect(14, {sa_family=AF_INET, sin_port=htons(999), sin_addr=inet_addr("192.168.0.3")}, 16) = -1 EINPROGRESS (Operation now in progress)
18:32:35.579033 sendto(14, "GET / HTTP/1.1\r\nHost: 192.168.0."..., 79, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = -1 ECONNREFUSED (Connection refused)
18:32:35.579237 setsockopt(14, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
18:32:35.579414 close(14)               = 0

You can see the bind call where the socket is bound to the source IP 192.168.0.2 (and the syscall returns success).

The process with PID 6154 from the strace is not running the configuration you provided. Not only doesn’t it set the source-IP, it is also setting other options that you didn’t even configure (you did not configure option tcp-smart-connect but haproxy is setting TCP_QUICKACK on the outgoing socket).

I suggest you triple check what configuration you are running and make sure you don’t have any old haproxy processes running in the background. It’s best to shutdown haproxy, confirm that nothing is responding anymore on port 999, and only then - when you are sure nothing is listening on port 999 you are restarting haproxy.

I don’t see how the IP route configuration is supposed to solve the problem, in fact it actually requires the local IP address to be known as it picks the outgoing interface based on that IP, but selecting that IP is the actual problem. But let’s focus on haproxy, I don’t see how ip route tables can solve this problem anyway.

I triple checked the config and as suspected, an old instance was running and causing the issues. Cleaned and rebooted and things are working as expected. Thanks for the time, updating as hopefully this helps another rookie down the road should someone stumble upon it.

1 Like