Hi @Baptiste,
I’m kind of having the same issue. I’ve also tried your suggestions from your last response but i’m still seeing my service (and other services) being impacted and showing up as unavailable during reloads. I can describe to you what i’m observing based on the logs.
I have a watch
command that continuously curls a service endpoint and often times on reloads, it returns an HTTP 503 Service Unavailable.
$ curl user-server
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
From the logs, i see the following:
Jun 24 21:51:51 ip-1-2-3-206 haproxy[25775]: Proxy user-server started.
Jun 24 21:51:51 ip-1-2-3-206 haproxy[25775]: Proxy user-server started.
Jun 24 21:51:51 ip-1-2-3-206 haproxy[25779]: Stopping backend user-server in 0 ms.
Jun 24 21:51:51 ip-1-2-3-206 haproxy[25779]: Stopping backend user-server in 0 ms.
Jun 24 21:51:51 ip-1-2-3-206 haproxy[27147]: user-server/user-server1 changed its FQDN from (null) to test-ecs-i-xxx.node.eu-west-1.consul by 'SRV record'
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server1 is DOWN, reason: Socket error, check duration: 0ms. 9 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server1 is DOWN, reason: Socket error, check duration: 0ms. 9 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server2 is DOWN, reason: Socket error, check duration: 0ms. 8 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server2 is DOWN, reason: Socket error, check duration: 0ms. 8 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server3 is DOWN, reason: Socket error, check duration: 0ms. 7 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server3 is DOWN, reason: Socket error, check duration: 0ms. 7 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server4 is DOWN, reason: Socket error, check duration: 0ms. 6 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server4 is DOWN, reason: Socket error, check duration: 0ms. 6 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server5 is DOWN, reason: Socket error, check duration: 0ms. 5 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server5 is DOWN, reason: Socket error, check duration: 0ms. 5 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server6 is DOWN, reason: Socket error, check duration: 0ms. 4 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server6 is DOWN, reason: Socket error, check duration: 0ms. 4 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server7 is DOWN, reason: Socket error, check duration: 0ms. 3 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server7 is DOWN, reason: Socket error, check duration: 0ms. 3 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server8 is DOWN, reason: Socket error, check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server8 is DOWN, reason: Socket error, check duration: 0ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server9 is DOWN, reason: Socket error, check duration: 0ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server9 is DOWN, reason: Socket error, check duration: 0ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server10 is DOWN, reason: Socket error, check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server10 is DOWN, reason: Socket error, check duration: 0ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: backend user-server has no server available!
Jun 24 21:51:52 ip-1-2-3-206 haproxy[27147]: backend user-server has no server available!
Jun 24 21:51:52 ip-1-2-3-206 haproxy.requests[27147]: {"request_time":"24/Jun/2018:21:51:52.821","host":"ip-1-2-3-206","protocol":"http","http_status":503,"user_agent":"curl/7.47.0","unique_id":"","headers":"{curl/7.47.0||user-server}","endpoint":"/","backend":"user-server","backend_name":"user-server","http_method":"GET","upstream_response_time":-1,"upstream_connect_time":-1,"bytes_read":213,"upstream_addr":"-","source_addr":"-","retries":"0","bytes_uploaded":75,"session_duration":0,"termination_state":"SC","http_query_params":"","accept_time":0,"idle_time":0,"client_time":0,"wait_time":-1,"download_time":-1,"active_time":0}
Jun 24 21:52:01 ip-1-2-3-206 haproxy[27147]: user-server/user-server1 changed its IP from to x.x.x.177 by DNS cache.
Jun 24 21:52:01 ip-1-2-3-206 haproxy[27147]: user-server/user-server1 changed its IP from to x.x.x.177 by DNS cache.
Jun 24 21:52:01 ip-1-2-3-206 haproxy[27147]: user-server/user-server3 changed its IP from to x.x.x.206 by DNS cache.
Jun 24 21:52:01 ip-1-2-3-206 haproxy[27147]: user-server/user-server3 changed its IP from to x.x.x.206 by DNS cache.
Jun 24 21:52:01 ip-1-2-3-206 haproxy[27147]: user-server/user-server2 changed its IP from to x.x.x.174 by DNS cache.
Jun 24 21:52:01 ip-1-2-3-206 haproxy[27147]: user-server/user-server2 changed its IP from to x.x.x.174 by DNS cache.
Jun 24 21:52:03 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server2 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jun 24 21:52:03 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server2 is UP, reason: Layer4 check passed, check duration: 0ms. 1 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jun 24 21:52:03 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server1 is UP, reason: Layer4 check passed, check duration: 0ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jun 24 21:52:03 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server1 is UP, reason: Layer4 check passed, check duration: 0ms. 2 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jun 24 21:52:03 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server3 is UP, reason: Layer4 check passed, check duration: 0ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jun 24 21:52:03 ip-1-2-3-206 haproxy[27147]: Server user-server/user-server3 is UP, reason: Layer4 check passed, check duration: 0ms. 3 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jun 24 21:52:04 ip-1-2-3-206 haproxy.requests[27147]: {"request_time":"24/Jun/2018:21:52:04.892","host":"ip-1-2-3-206","protocol":"http","http_status":401,"user_agent":"curl/7.47.0","unique_id":"","headers":"{curl/7.47.0||user-server}","endpoint":"/","backend":"user-server","backend_name":"user-server","http_method":"GET","upstream_response_time":2,"upstream_connect_time":0,"bytes_read":144,"upstream_addr":"x.x.x.174","source_addr":"x.x.x.206","retries":"0","bytes_uploaded":75,"session_duration":2,"termination_state":"--","http_query_params":"","accept_time":0,"idle_time":0,"client_time":0,"wait_time":0,"download_time":0,"active_time":2}
These happened the minute haproxy is reloaded. It’s happening on and off, not all the time.
From my observation, there are two things that i observe, which from my point of view shouldn’t happen.
1. The server IP being changed.
- I’m using Consul for service discovery and i think this happens because the set of nodes returned is randomized each time from Consul’s DNS. Could you confirm if this is expected behaviour?
2. The server being marked as DOWN even though the service is alive and running on the host with that IP and listening on that port.
- I don’t understand why this happens.
Here is my config:
global
log /dev/log len 65535 local0 info alert
log /dev/log len 65535 local1 notice alert
user haproxy
group haproxy
nbproc 1
nbthread 1
stats socket /var/run/haproxy.sock mode 660 level admin
server-state-file /var/lib/haproxy/server-state
stats timeout 2m
master-worker
defaults
log global
mode http
option httplog
timeout connect 5s
timeout client 30s
timeout server 30s
timeout http-request 30s
timeout http-keep-alive 60s
timeout queue 120s
timeout check 10s
retries 10
option redispatch
option forwardfor
maxconn 10000
load-server-state-from-file global
default-server init-addr none fastinter 1s rise 2 fall 2 on-error fastinter
no option http-server-close
option tcp-smart-connect
option tcp-smart-accept
option splice-auto
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend haproxy-ping
bind :7070
mode http
monitor-uri /ping
http-request set-log-level silent
errorfile 200 /etc/haproxy/ping.http
frontend stats
bind *:1936
mode http
option forceclose
stats enable
stats uri /
stats hide-version
stats show-legends
stats show-desc
stats show-node
stats realm Haproxy\ Statistics
resolvers consul
nameserver consul 127.0.0.1:8600
accepted_payload_size 8192
resolve_retries 3
timeout retry 1s
hold valid 10s
frontend http-in
bind *:80
log-tag haproxy.requests
capture request header User-Agent len 30
capture request header X-Request-ID len 36
capture request header Host len 32
log-format "{\"request_time\":\"%t\",\"host\":\"%H\",\"protocol\":\"http\",\"http_status\":%ST,\"user_agent\":%{+Q}[capture.req.hdr(0)],\"unique_id\":%{+Q}[capture.req.hdr(1)],\"headers\":\"%hr\",\"endpoint\":\"%HP\",\"backend\":\"%b\",\"backend_name\":%{+Q}[capture.req.hdr(2)],\"http_method\":\"%HM\",\"upstream_response_time\":%Tr,\"upstream_connect_time\":%Tc,\"bytes_read\":%B,\"upstream_addr\":\"%si\",\"source_addr\":\"%bi\",\"retries\":\"%rc\",\"bytes_uploaded\":%U,\"session_duration\":%Tt,\"termination_state\":\"%ts\",\"http_query_params\":\"%HQ\",\"accept_time\":%Th,\"idle_time\":%Ti,\"client_time\":%TR,\"wait_time\":%Tw,\"download_time\":%Td,\"active_time\":%Ta}"
use_backend user-server if { hdr(Host) -i user-server user-server.internal.example.com }
backend user-server
mode http
server-template user-server 10 _user-server._tcp.service.consul resolvers consul resolve-prefer ipv4 check
When i need to do the reload, i do it with the following command:
socat /var/run/haproxy.sock - <<< "show servers state" > /var/lib/haproxy/server-state && /bin/systemctl reload haproxy
And my haproxy is built with the following options:
$ haproxy -vv
HA-Proxy version 1.8.9-2b5ef62 2018/06/11
Copyright 2000-2018 Willy Tarreau <willy@haproxy.org>
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_THREAD=1 USE_OPENSSL=1 USE_SYSTEMD=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_TFO=1 USE_NS=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.38 2015-11-23
Running on PCRE version : 8.38 2015-11-23
PCRE library supports JIT : yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace
I’ve tried tweaking this quite a lot and i’ve been continuously reading up on articles around this from other posts and your responses to them, but i’m still not progressing forward in minimizing downtime of my service on reloads. Any help and suggestion would be greatly appreciated.
If you need help with testing from my end, please don’t hesitate to ask.
Really looking forward to nailing this once and for all.
Thanks!