server-template is a great feature for Kubernetes haproxy users, but we are suffering a problem with haproxy 2.0.14 (it is the only version we’ve ever tried) with
server-template as the backend server discovery on Kubernetes.
The background is we use haproxy as load balancer to accept incoming requests from the client and load balance the request to our backend app servers running in the different pods in the same Kube namespace. Meanwhile, we have several such deployments in other kube namespaces too.
server-template still keeps the out-of-date backend server pod’s IP, when the server is marked to be down, this happens when we do the scaling down, which reduces the number of app pods in the backend by kubectl scale command.
The problem with this behavior is: the already deleted pod’s IP will be reclaimed and recycled by Kubernetes, and that IP will be used when some other new pod is created after some time. And in some not-rare case, the new pod can be similar app pod but running in another Kubernetes namespace. And that violates the rule of how we expect
server-template to work: only cares what the SRV record says about the endpoints(pod IPs) of the backend app service in the current namespace.
Here is what we are seeing from
show servers state:
$ echo "show servers state" | socat ./admin-1.sock # be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result srv_check_health srv_check_state srv_agent_state bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord 8 app-perf-us-south-01 1 server-1 172.30.254.86 2 64 1 1 83648 15 3 4 6 0 0 0 172-30-254-86.app.app-perf.svc.cluster.local 5983 _https._tcp.app.app-perf.svc.cluster.local 8 app-perf-us-south-01 2 server-2 172.30.233.39 2 64 1 1 83643 15 3 4 6 0 0 0 172-30-233-39.app.app-perf.svc.cluster.local 5983 _https._tcp.app.app-perf.svc.cluster.local 8 app-perf-us-south-01 3 server-3 172.30.47.190 0 64 1 1 34795 7 2 0 6 0 0 0 - 5983 _https._tcp.app.app-perf.svc.cluster.local
so we can see the
server-3 with IP of
172.30.47.190 is still here with an empty
srv_fqdn and the
while the dig to the SRV record at the same time shows there’s no such entry can be resolved to this IP:
$ dig -t SRV _https._tcp.app.app-perf.svc.cluer.local +short 0 4 5983 172-30-130-236.app.app-perf.svc.cluster.local. 0 4 5983 172-30-139-184.app.app-perf.svc.cluster.local.
only 2 entries are here.
172.30.47.190 was later used by another pod, and haproxy thinks it’s back and starts to distribute traffic to it no matter if or not where the new pod lives in (another namespace, or maybe even another kind of pod).
So this is like a problem that blocking us to use
server-template in Kubernetes. I am wondering if I can get any help on this case to identify if I missed any configuration to make
server-template as I expect: always keep the server list up-to-date with what the SRV record resolves.
Here is the haproxy -vv output
HA-Proxy version 2.0.14 2020/04/02 - https://haproxy.org/ Build options : TARGET = linux-glibc CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_THREAD=1 USE_PTHREAD_PSHARED=1 USE_REGPARM=1 USE_STATIC_PCRE2=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_TFO=1 USE_SYSTEMD=1 Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD +PTHREAD_PSHARED +REGPARM -STATIC_PCRE +STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with multi-threading support (MAX_THREADS=64, default=32). Built with OpenSSL version : OpenSSL 1.1.1g 21 Apr 2020 Running on OpenSSL version : OpenSSL 1.1.1g 21 Apr 2020 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3 Built with Lua version : Lua 5.3.4 Built with network namespace support. Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with libslz for stateless compression. Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with PCRE2 version : 10.30 2017-08-14 PCRE2 library supports JIT : yes Encrypted password support via crypt(3): yes Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as <default> cannot be specified using 'proto' keyword) h2 : mode=HTX side=FE|BE mux=H2 h2 : mode=HTTP side=FE mux=H2 <default> : mode=HTX side=FE|BE mux=H1 <default> : mode=TCP|HTTP side=FE|BE mux=PASS Available services : none Available filters : [SPOE] spoe [COMP] compression [CACHE] cache [TRACE] trace
It will be great if I can get help here. Thank you very much.