We have haproxy-2.2.6 and are using basic config for our backend hec-backend which is load balancing a REST api based service.
backend hec-backend
mode http
option httpchk
http-check send meth GET uri /services/collector/health
server hec_192.168.10.5 192.168.10.5:8088 maxconn 16 weight 10 check port 8088
server hec_192.168.10.6 192.168.10.6:8088 maxconn 16 weight 10 check port 8088
server hec_192.168.10.7 192.168.10.7:8088 maxconn 16 weight 10 check port 8088
Recently we have been seeing intermittent 503 errors from some of the backend machines due to high load. While the permanent solution is to upgrade hardware. As this can take some time, we are trying to see if we can redispatch such requests away from the high load machines as a step to mitigate the issue.
We tried to add something like below to this backend
retries 3
retry-on all-retryable-errors 502 503 504
option redispatch 1
The intention was to redispatch request to another backend machine whenever we see a failure, primarily 5xx errors. When testing this using curl, it worked as expected with low request rate, but we found that, during high request rate in an actual scenario, most of the requests still ends up with 503. It is not clear why haproxy is not redispatching requests.
Please share any thoughts on why this may be happening.
Example of error (while still using redispatch)
<30>Jan 3 11:14:16 splunk_haproxy[2638905]: 192.168.114.15:52148 [03/Jan/2025:11:14:16.907] http-in~ hec-backend/hec_192.168.10.5 0/0/0/0/0 503 66 - - ---- 4/2/0/0/0 0/0 "POST /services/collector/event HTTP/1.1"
Example of successful redispatch(i think +1 denotes a redispatch)
<30>Jan 3 11:19:18 splunk_haproxy[2638905]: 192.168.114.15:57354 [03/Jan/2025:11:19:18.410] http-in~ hec-backend/hec_198.18.10.6 0/0/0/1/1 200 237 - - ---- 2/2/0/0/+1 0/0 "POST /services/collector/event HTTP/1.1"