503 errors after backends go down for a while


#1

Hi All,

I have an issue with my HAProxy configuration. I would like it to always check if one of the backends configured becomes available. What seems to be happening however is when both backends go away for maintenance for several hours, after they come back online HAProxy continues to serve a 503 error. When I restart haproxy (service haproxy restart) it starts working again.

My config is pasted below. The frontend where I see this problem is port 443. Thanks in advance for pointing out anything that I could change to prevent this behaviour:

global
    daemon
    maxconn 2048
    tune.ssl.default-dh-param 2048
    log 127.0.0.1 local0
    log 127.0.0.1 local0 notice

defaults
    mode http
    option forwardfor
    option http-server-close
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms
    stats enable
    stats uri /stats
    stats realm Haproxy\ Statistics
    stats auth myuser:mypass
    log global
    option httpclose
    option httplog
    option  dontlognull

frontend www-http
   bind *:80
   reqadd X-Forwarded-Proto:\ http
   default_backend www-backend

frontend www-https
   bind *:443 ssl crt /etc/haproxy/ssl/generic.pem
   reqadd X-Forwarded-Proto:\ https
   acl use_local_backend path_reg ^/app/[0-9a-zA-Z]*.app.js$
   acl use_jira_backend hdr(host) -i support-customer.mycompany.com
   acl use_atcui_backend hdr(host) -i atcui-customer.mycompany.com
   acl use_sguix_backend hdr(host) -i sguix-customer.mycompany.com
   acl is_root_path path -i /
   redirect code 301 location https://support-customer.mycompany.com/servicedesk/customer/portal/3 if use_jira_backend is_root_path
   use_backend nginx-localhost-backend if use_local_backend
   use_backend jira-backend if use_jira_backend
   use_backend atcui_backend if use_atcui_backend
   use_backend sguix_backend if use_sguix_backend
   default_backend www-backend

backend www-backend
   redirect scheme https if !{ ssl_fc }
   server www-0 10.1.0.1:443 check ssl verify none
   server www-1 10.1.1.1:443 check ssl verify none
   option httplog
   no option http-server-close
   option http-keep-alive

backend nginx-localhost-backend
   server nginx-local localhost:8000 check
   option httplog
   option httpclose

backend jira-backend
   server jira1 10.1.3.83:443 check ssl verify none

backend atcui_backend
   server atcui1 10.200.1.118:443 check ssl verify none
   server atcui2 10.200.12.32:443 check ssl verify none

backend sguix_backend
   server sguix1 10.22.3.95:80 check

Cheers.
—Marc


#2

try using smaller check timeouts. also specify a check URL in the backend like:
option httpchk HEAD / http-check expect status 200 server atcui1 10.200.1.118:443 check inter 60000 fastinter 1000 fall 3 rise 5 observe layer7 ssl verify none server atcui2 10.200.12.32:443 check inter 60000 fastinter 1000 fall 3 rise 5 observe layer7 ssl verify none

this polls / every minute, unless it doesn’t report 200 it cuts retry time to one second. it takes three failed samples to take the backend offline and five successful to bring it back online.