Hello community,
We’re using HAProxy in Kubernetes as a sticky load balancer in front of a deployment of five total pods (real HAProxy, not the ingress controller version of it). Here’s our config:
global
daemon
maxconn 10000
stats socket /usr/local/etc/haproxy/admin.sock mode 600 level admin
log /dev/log local0
defaults
mode http
timeout connect 5000ms
timeout client 30000ms
timeout server 30000ms
resolvers kubernetes
nameserver skydns kube-dns.kube-system:53
resolve_retries 10
timeout retry 2s
hold valid 5s
frontend http-in
bind *:80
log /dev/log local0
option httplog
default_backend servers
backend servers
balance roundrobin
stick-table type string size 100m
option httpchk GET /health
http-check expect status 200
option tcp-check
stick on path, word(3,/)
server-template pod 5 pod.namespace.svc.cluster.local:8080 check resolvers kubernetes inter 500
As you can see we’re leveraging the server-templates and k8s dns resolvers to create the backend servers dynamically.
We’ve been pretty happy with the stick table approach so far, but we have some issues when doing a rolling upgrade of the backend servers. Kubernetes will start to terminate a pod and start-up a couple of new ones and wait until the old ones are completely wound down.
Now we’ve observed some inconsistency of the stick table during this rolling upgrade scenario, namely the instance of HAProxy would happily forward all requests to what it thinks is pod3. But on the request log of the backend we would see it ending up on three different backend servers.
Here’s the HAProxy request log for that period:
[29/Apr/2019:10:39:46.149] http-in servers/pod3 0/0/0/12812/12813 200 1031 - - ---- 303/303/235/3/0 0/0 "GET /v1/document/sticky_1556013348827_0fmbgvj50nmc/stepssince/983?_no_ie_cache=1556527174123 HTT
[29/Apr/2019:10:39:47.194] http-in servers/pod3 0/0/0/30/30 200 338 - - ---- 300/300/234/3/0 0/0 "GET /v1/document/sticky_1556013348827_0fmbgvj50nmc/stepssince/1006?_no_ie_cache=1556527187165 HTTP/1.1"
[29/Apr/2019:10:39:47.197] http-in servers/pod3 0/0/0/31/31 200 309 - - ---- 300/300/233/2/0 0/0 "POST /v1/document/sticky_1556013348827_0fmbgvj50nmc/steps HTTP/1.1"
[29/Apr/2019:10:39:47.277] http-in servers/pod3 0/0/0/34/34 200 338 - - ---- 300/300/233/2/0 0/0 "GET /v1/document/sticky_1556013348827_0fmbgvj50nmc/stepssince/1006?_no_ie_cache=1556527187248 HTTP/1.1"
HAProxy still thinks it is sending everything to pod3.
Here’s what we receive on the backend, where we can see requests for the same sticky_id path fragment to different pods at nearly the same time (request log -> pod identifier).
2019-04-29T10:39:46.962Z 'REQUEST-OK [GET] [/v1/document/sticky_1556013348827_0fmbgvj50nmc/stepssince/983]' -> pod-799496f69-dbp8s
2019-04-29T10:39:47.223Z 'REQUEST-OK [GET] [/v1/document/sticky_1556013348827_0fmbgvj50nmc/stepssince/1006]' -> pod-58fdfcc477-zjcrq
2019-04-29T10:39:47.227Z 'REQUEST-OK [POST] [/v1/document/sticky_1556013348827_0fmbgvj50nmc/steps]' -> pod-799496f69-dbp8s
2019-04-29T10:39:47.306Z 'REQUEST-START [GET] [/v1/document/sticky_1556013348827_0fmbgvj50nmc/stepssince/1006]' -> pod-58fdfcc477-zjcrq
Here’s the proxy logs for that period that shows how the DNS resolver switches the IPs:
April 29th 2019, 10:39:19.000 [WARNING] 118/083919 (1) : Server servers/pod1 is going DOWN for maintenance (No IP for server ). 4 active and 0 backup servers left. 11 sessions active, 0 requeued, 0 remaining in queue.
April 29th 2019, 10:39:34.000 [WARNING] 118/083934 (1) : Server servers/pod1 ('pod.namespace.svc.cluster.local') is UP/READY (resolves again).
April 29th 2019, 10:39:34.000 [WARNING] 118/083934 (1) : Server servers/pod1 administratively READY thanks to valid DNS answer.
April 29th 2019, 10:39:34.000 [WARNING] 118/083934 (1) : Server servers/pod5 is going DOWN for maintenance (No IP for server ). 4 active and 0 backup servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
April 29th 2019, 10:39:34.000 [WARNING] 118/083934 (1) : servers/pod1 changed its IP from 172.20.5.223 to 172.20.4.207 by kubernetes/skydns.
April 29th 2019, 10:39:34.000 [WARNING] 118/083934 (1) : servers/pod2 changed its IP from 172.20.4.248 to 172.20.4.245 by DNS cache.
April 29th 2019, 10:39:39.000 [WARNING] 118/083939 (1) : servers/pod3 changed its IP from 172.20.5.50 to 172.20.5.232 by DNS cache.
April 29th 2019, 10:39:49.000 [WARNING] 118/083949 (1) : Server servers/pod5 ('pod.namespace.svc.cluster.local') is UP/READY (resolves again).
April 29th 2019, 10:39:49.000 [WARNING] 118/083949 (1) : Server servers/pod5 administratively READY thanks to valid DNS answer.
April 29th 2019, 10:39:49.000 [WARNING] 118/083949 (1) : servers/pod4 changed its IP from 172.20.5.176 to 172.20.4.111 by DNS cache.
April 29th 2019, 10:39:49.000 [WARNING] 118/083949 (1) : servers/pod5 changed its IP from 172.20.5.98 to 172.20.5.164 by DNS cache.
The HAProxy is a single pod that did not restart or anything else that would wipe the sticky table somehow.
We suspect that this might be due to connections being pooled and thus held on while the server underneath is actually changing its IP via the DNS resolver. Does HAProxy have draining support for such a scenario and using the server-template? What else could cause such a behaviour?
I understand that this is probably also fairly specific to Kubernetes, but any helpful pointers on what’s going on here are appreciated.
Thanks a ton,
Thomas