Redispatch ERROR 503

Hi All,

Some 503 returns still occur, even if Redispatch is configured.
HAproxy and Hashicorp Consul on the same server.

Environment:

  • Consul v1.10.0 - Revision 27de64da7

      rogerio@cloud:/lblog$ sudo haproxy -vv
      HAProxy version 2.4.2-1~bpo10+1 2021/07/07 - https://haproxy.org/
      Status: long-term supported branch - will stop receiving fixes around Q2 2026.
      Known bugs: http://www.haproxy.org/bugs/bugs-2.4.2.html
      Running on: Linux 4.19.0-16-cloud-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64
      Build options :
        TARGET  = linux-glibc
        CPU     = generic
        CC      = cc
        CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-2.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
        OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
        DEBUG   =
    
      Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING
    
      Default settings :
        bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
    
      Built with multi-threading support (MAX_THREADS=64, default=2).
      Built with OpenSSL version : OpenSSL 1.1.1d  10 Sep 2019
      Running on OpenSSL version : OpenSSL 1.1.1d  10 Sep 2019
      OpenSSL library supports TLS extensions : yes
      OpenSSL library supports SNI : yes
      OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
      Built with Lua version : Lua 5.3.3
      Built with the Prometheus exporter as a service
      Built with network namespace support.
      Built with libslz for stateless compression.
      Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
      Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
      Built with PCRE2 version : 10.32 2018-09-10
      PCRE2 library supports JIT : yes
      Encrypted password support via crypt(3): yes
      Built with gcc compiler version 8.3.0
    
      Available polling systems :
      	  epoll : pref=300,  test result OK
      	   poll : pref=200,  test result OK
      	 select : pref=150,  test result OK
      Total: 3 (3 usable), will use epoll.
    
      Available multiplexer protocols :
      (protocols marked as <default> cannot be specified using 'proto' keyword)
      			  h2 : mode=HTTP       side=FE|BE     mux=H2       flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
      			fcgi : mode=HTTP       side=BE        mux=FCGI     flags=HTX|HOL_RISK|NO_UPG
      	   <default> : mode=HTTP       side=FE|BE     mux=H1       flags=HTX
      			  h1 : mode=HTTP       side=FE|BE     mux=H1       flags=HTX|NO_UPG
      	   <default> : mode=TCP        side=FE|BE     mux=PASS     flags=
      			none : mode=TCP        side=FE|BE     mux=PASS     flags=NO_UPG
    
      Available services : prometheus-exporter
      Available filters :
      		[SPOE] spoe
      		[CACHE] cache
      		[FCGI] fcgi-app
      		[COMP] compression
      		[TRACE] trace
    

Settings:

global
	log /dev/log    local0
	log /dev/log    local1 notice
	chroot /var/lib/haproxy
	stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
	stats timeout 30s
	user haproxy
	group haproxy
	daemon
	maxconn 30000
	ca-base /etc/ssl/certs
	crt-base /etc/ssl/private

defaults
	log     global         
	mode    http           
	option  httplog        
	option  dontlognull    
	timeout connect 5s     
	timeout client  6m     
	timeout server  6m     
	errorfile 400 /etc/haproxy/errors/400.http
	errorfile 403 /etc/haproxy/errors/403.http
	errorfile 408 /etc/haproxy/errors/408.http
	errorfile 500 /etc/haproxy/errors/500.http
	errorfile 502 /etc/haproxy/errors/502.http
	errorfile 503 /etc/haproxy/errors/503.http
	errorfile 504 /etc/haproxy/errors/504.http

frontend  cloud
    bind        :443 ssl crt /etc/ssl/private/cloud.com.br.pem alpn h2,http/1.1
    bind        :80  

   http-response set-header Strict-Transport-Security "max-age=31536000;"

   http-response del-header Server
   http-response del-header x-powered-by
   http-response del-header x-aspnet-version
   http-response del-header x-aspnetmvc-version

   http-request capture req.hdr(X-Forwarded-For) len 64

   acl VHOST_principal req.hdr(Host) -i -m dom cloud.com.br
   use_backend         mvc                         if VHOST_principal
 
backend mvc
    balance     leastconn
    option      httpchk
    option      http-server-close
    http-check  send meth GET uri / ver HTTP/1.1 hdr Host cloud.com.br
    
   option      redispatch 1
   retry-on    503 conn-failure empty-response
   retries     5

   server-template mvc 51 _mvc._tcp.service.consul check fall 3 rise 2 resolvers proddns init-addr none

resolvers proddns
    nameserver dns1 127.0.0.1:8600 # Hashicorp Consul
    accepted_payload_size 8192

LOG:

Jul 30 09:39:04 cloud haproxy[12403]: ***.**.223.130:15250 [30/Jul/2021:09:39:04.847] cloud~ mvc/mvc45 0/2/1/0/3 503 493 - - ---- 9502/9502/144/0/+5 0/0 {******:3cb0:510:5d10} "GET /Home/Bloquear?__cnv=aKXd9&tenant=org000290 HTTP/1.1"
Jul 30 09:39:04 cloud haproxy[12403]: ***.**.24.206:47048 [30/Jul/2021:09:39:04.851] cloud~ mvc/mvc45 0/2/0/1/3 503 493 - - ---- 9502/9502/144/0/+5 0/0 {****.250} "GET /Home/ObterQuantidade?__cnv=myYip&tenant=org000194 HTTP/1.1"
Jul 30 09:39:04 cloud haproxy[12403]: ***.**.25.159:28712 [30/Jul/2021:09:39:04.854] cloud~ mvc/mvc45 0/2/0/0/2 503 493 - - ---- 9502/9502/145/0/+5 0/0 {****.191} "POST /Documento/ObterModelos?__cnv=hkOE8&idEmpresa=6&tenant=org000035 HTTP/1.1"
Jul 30 09:39:04 cloud haproxy[12403]: ***.**.26.194:19872 [30/Jul/2021:09:39:04.855] cloud~ mvc/mvc3 0/1/1/0/2 503 493 - - ---- 9502/9502/144/0/+5 0/0 {****.245} "POST /Integracao/ObterMercadorias?__cnv=mB2Zs&idEmpresa=1&tenant=org000211 HTTP/1.1"
Jul 30 09:39:04 cloud haproxy[12403]: ***.**.24.202:15432 [30/Jul/2021:09:39:04.865] cloud mvc/mvc3 0/2/0/0/2 503 493 - - ---- 9502/9502/144/0/+5 0/0 {*****.158.0} "GET / HTTP/1.1"
Jul 30 09:39:06 cloud haproxy[12403]: ***.**.24.196:12198 [30/Jul/2021:09:39:06.260] cloud~ mvc/mvc3 0/0/0/0/0 503 493 - - ---- 9502/9502/163/0/0 0/0 {****.255} "POST /NotaFiscal/Inserir?idRequisicao=9048&origemEmissao=NotaFiscalDeRequisicao&idFinanceiroConta=3&idEmpresa=3&idTipoItem=1&idNaturezaOperacao=3&valor=27&caixaRapido=True&__cnv=ru2Hd HTTP/1.1"
Jul 30 09:39:10 cloud haproxy[12403]: ***.**.24.254:20100 [30/Jul/2021:09:39:10.685] cloud~ mvc/mvc11 0/0/0/1/1 503 493 - - ---- 9508/9508/182/1/0 0/0 {****.189} "POST /NotaFiscal/Inserir?origemEmissao=7&idRequisicao=48340&idempresa=3&origemSPA=true&__cnv=qmPaU HTTP/1.1"
Jul 30 09:39:11 cloud haproxy[12403]: ***.**.25.159:53872 [30/Jul/2021:09:39:11.201] cloud~ mvc/mvc11 0/0/0/1/1 503 493 - - ---- 9516/9516/199/1/0 0/0 {****.29} "POST /DocumentoEntrada/Inserir?IdEmpresa=18&__cnv=jShsL HTTP/1.1"
Jul 30 09:39:11 cloud haproxy[12403]: ***.**.24.202:37318 [30/Jul/2021:09:39:11.599] cloud~ mvc/mvc11 0/0/0/0/0 503 493 - - ---- 9505/9505/211/0/0 0/0 {****.53} "POST /NotaFiscal/Inserir?idRequisicao=251481&origemEmissao=NotaFiscalDeRequisicao&idFinanceiroConta=117&idEmpresa=24&idTipoItem=1&valor=31.5&caixaRapido=True&__cnv=hgAfR HTTP/1.1"

Other LOGS

Consul LOG
Jul 30 08:31:56 cloud consul[899]: 2021-07-30T08:31:56.569-0300 [INFO]  agent.server.serf.lan: serf: EventMemberJoin: ig-web-pro2-l1f9 10.100.100.96
Jul 30 08:31:56 cloud consul[899]: 2021-07-30T08:31:56.571-0300 [INFO]  agent.server: member joined, marking health alive: member=ig-web-pro2-l1f9

HAProxy LOG
Jul 30 08:32:50 cloud haproxy[12400]: [WARNING]  (12403) : mvc/mvc24 changed its IP from (none) to 10.100.100.96 by proddns/dns1.
Jul 30 08:32:50 cloud haproxy[12403]: mvc/mvc24 changed its FQDN from (null) to ig-web-pro2-l1f9.node.dc-gcp.consul by 'SRV record'
Jul 30 08:32:50 cloud haproxy[12403]: mvc/mvc24 changed its FQDN from (null) to ig-web-pro2-l1f9.node.dc-gcp.consul by 'SRV record'
Jul 30 08:32:50 cloud haproxy[12403]: mvc/mvc24 changed its IP from (none) to 10.100.100.96 by proddns/dns1.
Jul 30 08:32:50 cloud haproxy[12403]: mvc/mvc24 changed its IP from (none) to 10.100.100.96 by proddns/dns1.
Jul 30 08:32:50 cloud haproxy[12403]: Server mvc/mvc24 ('ig-web-pro2-l1f9.node.dc-gcp.consul') is UP/READY (resolves again).
Jul 30 08:32:50 cloud haproxy[12403]: Server mvc/mvc24 ('ig-web-pro2-l1f9.node.dc-gcp.consul') is UP/READY (resolves again).
Jul 30 08:32:50 cloud haproxy[12403]: Server mvc/mvc24 administratively READY thanks to valid DNS answer.
Jul 30 08:32:50 cloud haproxy[12400]: [WARNING]  (12403) : Server mvc/mvc24 ('ig-web-pro2-l1f9.node.dc-gcp.consul') is UP/READY (resolves again).
Jul 30 08:32:50 cloud haproxy[12403]: Server mvc/mvc24 administratively READY thanks to valid DNS answer.
Jul 30 08:32:50 cloud haproxy[12400]: [WARNING]  (12403) : Server mvc/mvc24 administratively READY thanks to valid DNS answer.
Jul 30 08:32:52 cloud haproxy[12403]: Server mvc/mvc24 is DOWN, reason: Layer7 timeout, check duration: 2000ms. 21 active and 0 backup servers left. 14 sessions active, 0 requeued, 0 remaining in queue.
Jul 30 08:32:52 cloud haproxy[12403]: Server mvc/mvc24 is DOWN, reason: Layer7 timeout, check duration: 2000ms. 21 active and 0 backup servers left. 14 sessions active, 0 requeued, 0 remaining in queue.
Jul 30 08:32:52 cloud haproxy[12400]: [WARNING]  (12403) : Server mvc/mvc24 is DOWN, reason: Layer7 timeout, check duration: 2000ms. 21 active and 0 backup servers left. 14 sessions active, 0 requeued, 0 remaining in queue.
...
Jul 30 08:32:53 cloud haproxy[12403]: ***.***.24.16:60494 [30/Jul/2021:08:32:50.096] cloud~ mvc/mvc24 0/0/1/2907/2908 200 385 - - ---- 10038/10038/233/13/0 0/0 {***.***.238.162} "GET /api/reports/clients/426667f63cd/instances/8d34bbc9b08/documents/eaf1914c75d11d1d886bcb/info HTTP/1.1"
...
Jul 30 08:32:58 cloud haproxy[12400]: [WARNING]  (12403) : Server mvc/mvc24 is UP, reason: Layer7 check passed, code: 302, check duration: 3ms. 23 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jul 30 08:32:58 cloud haproxy[12403]: Server mvc/mvc24 is UP, reason: Layer7 check passed, code: 302, check duration: 3ms. 23 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
Jul 30 08:32:58 cloud haproxy[12403]: Server mvc/mvc24 is UP, reason: Layer7 check passed, code: 302, check duration: 3ms. 23 active and 0 backup servers online. 0 sessions requeued, 0 total in queue.
...
Jul 30 08:43:24 cloud haproxy[12403]: ***.***.24.254:47246 [30/Jul/2021:08:43:22.766] cloud~ mvc/mvc24 0/0/0/1290/1290 200 88834 - - ---- 9771/9771/153/5/0 0/0 {***.**.***.111} "POST /VeiculoEstoque/Obter/428?idRotina=57&leiaute=False&tipoFiltro=2&titulo=Estoque&__cnv=Ehh74&idEmpresa=2&tenant=org000251 HTTP/1.1"
Jul 30 08:43:25 cloud haproxy[12403]: ***.***.26.194:10698 [30/Jul/2021:08:43:25.205] cloud~ mvc/mvc24 0/0/0/1/1 503 493 - - ---- 9769/9769/195/2/0 0/0 {***.87.***.155} "POST /NotaFiscal/Inserir?idOSNotaFiscal=325843&origemEmissao=NotaFiscalDeOrdemDeServico&idFinanceiroConta=51&idEmpresa=13&idTipoItem=2&idNaturezaOperacao=1111&valor=254.82&caixaRapido=True&__cnv=8c6np HTTP/1.1"
Jul 30 08:43:25 cloud haproxy[12403]: ***.***.224.162:36718 [30/Jul/2021:08:43:25.759] cloud~ mvc/mvc24 0/0/0/20/20 503 493 - - ---- 9766/9766/213/1/0 0/0 {**:**:7a:**:562:***:5b84:f3a1} "POST /NotaFiscal/Inserir?idRequisicao=19920&origemEmissao=NotaFiscalDeRequisicao&idFinanceiroConta=21&idEmpresa=4&idTipoItem=1&idNaturezaOperacao=395&valor=63&caixaRapido=True&__cnv=XGuF4 HTTP/1.1"
...
Jul 30 08:43:30 cloud haproxy[12400]: [WARNING]  (12403) : Server mvc/mvc24 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 0ms. 25 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jul 30 08:43:30 cloud haproxy[12403]: Server mvc/mvc24 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 0ms. 25 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jul 30 08:43:30 cloud haproxy[12403]: Server mvc/mvc24 is DOWN, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 0ms. 25 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Jul 30 08:43:34 cloud haproxy[12400]: [WARNING]  (12403) : Server mvc/mvc24 was DOWN and now enters maintenance (entry removed from SRV record).
Jul 30 08:43:34 cloud haproxy[12403]: Server mvc/mvc24 was DOWN and now enters maintenance (entry removed from SRV record).
Jul 30 08:43:34 cloud haproxy[12403]: Server mvc/mvc24 was DOWN and now enters maintenance (entry removed from SRV record).

Consul LOG
Jul 30 08:43:44 cloud consul[899]: 2021-07-30T08:43:44.784-0300 [INFO]  agent.server.serf.lan: serf: EventMemberLeave: ig-web-pro2-l1f9 10.100.100.96
Jul 30 08:43:44 cloud consul[899]: 2021-07-30T08:43:44.784-0300 [INFO]  agent.server: deregistering member: member=ig-web-pro2-l1f9 reason=left

Redispatch is NOT a magic bullet for server problems, and all those 503 are server generated.

https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#4.2-retry-on

Requests not fitting in a single buffer will never be
retried (see the global tune.bufsize setting).

Many of those requests are POST requests and probably don’t fit the buffer size.

Hi @lukastribus ,

Thanks a lot!

Changing the falls and buffers settings, has improved a lot (99%) the error 503.

global
   **tune.bufsize 131072**

backend mvc
   balance     leastconn
   option      httpchk
   option      http-server-close
   http-check  send meth GET uri / ver HTTP/1.1 hdr Host cloud.com.br
   
   option      redispatch 1
   retry-on    503 conn-failure empty-response
   retries     7
   server-template mvc 51 _mvc._tcp.service.consul check **fall 1** rise 2 resolvers proddns init-addr none

Errors still occur in some large POST methods. Any notes to increase tune.bufsize more?

You can run out of memory. You need to consider what global maxconn values you have, what the worst-case memory usage is, and how much RAM you have.

You don’t want to haproxy use so much memory that it get’s OOM killed.