I’m having this strange issue where HAProxy randomly stops working for a specific URL/IP and I cannot figure out the reason. A restart of the service will resolve the issue but I need to determine a permanent fix for this rather than a workaround cronjob that restarts the service periodically.
I am proxying a request from a server to an API
When HAProxy is working my logs will look like below (URL replaced with ‘site’ and IP changed for confidentiality):
/var/log/haproxy/access.log-20200117.gz:
haproxy[26037]: 10.10.50.50.:42386 [16/Jan/2020:13:39:14.934] site-uat_site.com site-uat_com/site-uat_com 0/153/272 1104 – 1/1/0/1/0 0/0
and then at some random time, possibly after a request hasn’t been sent to this URL for a while I get the below in my logs:
/var/log/haproxy/access.log-20200118.gz:
haproxy[26037]:10.10.50.10:60657 [17/Jan/2020:12:49:26.065] site-uat_site.com site-uat_com/site-uat_com 0/-1/120008 212 sC 0/0/0/0/3 0/0
Note the 212 error code and sC termination code. I’ve tried doing some research into this:
s: the server-side time-out expired first.
C: waiting for CONNECTION to establish on the server. The server might at most have noticed a connection attempt.
I check the log on the backend of the proxy and I don’t see any traffic reaching it.
Any help or ideas would be greatly appreciated, I’m even a little stuck on how to start troubleshooting this.
Config:
frontend site-uat_site_com
bind 10.10.50.50:6443
mode tcp
log global
option tcplog
option dontlognull
timeout client 90s
use_backend site-uat_com
backend site-uat_com
mode http
timeout connect 30s
timeout server 30s
balance roundrobin
http-request set-header Host nonprod-site.com
server site-uat_com nonprod-site.com:443 ssl verify none
It’s not a 212 error code, it’s 212 bytes read from the server, when it aborted.
Your configuration is very confusing I don’t understand what you are trying to achieve.
You have a HTTP (not HTTPS) request reching port 6443 (why 6443 if it isn’t HTTPS), but you don’t use mode http
here, and then you go through a backend in http mode with SSL on the server.
Can you clarify what it is that you are expecting haproxy todo?
Right thank you! I had read that it wasn’t a 212 error code but kept hearing that from my team. Changed my subject line accordingly.
We use HAProxy to be able to support TLS1.2 between a very old in house application and the API the traffic is sent out to, since this traffic goes over the Internet. I also didn’t do the initial config so I can’t speak to why they went this direction, but I do know 6443 could very well be 8086 or something to make more sense but wouldn’t change the behaviour.
However, I did restart HAProxy yesterday evening and once again by this morning the traffic has stopped reaching the API. I haven’t confirmed that the traffic isn’t leaving the server but I have confirmed it’s not reaching the API and I know if I restart HAProxy right now this issue will resolve.
Let me read more into the modes, I have no idea why the frontend is tcp, that does seem odd to me too if you’re saying it should probably be http.
Can you provide the complete configuration and the output of haproxy -vv
.
I assume the reason for this are missconfigured timeouts.
Thanks for your assistance Lukas. I’ve included the requested info, I left out the unrelated frontend/backend configs.
[root@server loc]# haproxy -vv
HA-Proxy version 1.7.8 2017/07/07
Copyright 2000-2017 Willy Tarreau willy@haproxy.org
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -DTCP_USER_TIMEOUT=18
OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available filters :
[COMP] compression
[TRACE] trace
[SPOE] spoe
Global/Default Configs:
global
maxconn 100000
stats socket /var/run/haproxy.stat mode 600 level admin
log 127.0.0.1 local2 debug
chroot /var/empty
ssl-default-bind-ciphers ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:ECDHE-RSA-DES-CBC3-SHA:ECDHE-ECDSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:DES-CBC3-SHA:HIGH:SEED:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!RSAPSK:!aDH:!aECDH:!EDH-DSS-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA:!SRP
ssl-default-bind-options no-tls-tickets
ssl-default-server-ciphers ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:ECDHE-RSA-DES-CBC3-SHA:ECDHE-ECDSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:DES-CBC3-SHA:HIGH:SEED:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!RSAPSK:!aDH:!aECDH:!EDH-DSS-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA:!SRP
ssl-default-server-options no-tls-tickets
tune.ssl.default-dh-param 1024
daemon
defaults
timeout client 30s
timeout connect 5s
timeout server 5s
frontend dfc-uat_d_com
bind 10.10.50.50:6443
mode tcp
log global
option tcplog
option dontlognull
timeout client 90s
use_backend dfc-uat_com
backend dfc-uat_com
mode http
timeout connect 30s
timeout server 30s
balance roundrobin
http-request set-header Host nonprod-dfc.com
#New PsPrint LZ URL
server dfc-uat_com nonprod-dfc.com:443 ssl verify none
Are you sure you have given me the correct configuration?
As per your error log, the connection timed out after 120 seconds in “timeout connect”:
haproxy[26037]:10.10.50.10:60657 [17/Jan/2020:12:49:26.065] site-uat_site.com site-uat_com/site-uat_com 0/-1/120008 212 sC 0/0/0/0/3 0/0
But none of the configuration snippets you provided matches a 120 second timeout.
I suggest to move from tcp to http mode and upgrade the logging method to http. This will give us a better picture in the log when you hit that issue.
So instead of:
mode tcp
log global
option tcplog
you use
mode http
log global
option httplog
Also, when in this situation, please checkout if you have stale sockets with netstat.
I determined that the problem is with the backend hostname:
server dfc-uat_com nonprod-dfc.com:443 ssl verify none
There are two dynamic IPs associated with nonprod-dfc.com and HAProxy requires a reload/restart to update the IP in it’s cache for this hostname.
Could anyone confirm that I could resolve this with:
resolvers dns
nameserver public-0 xx.xx.xx.xx:53
hold valid 1s
frontend http
bind *:8000
default_backend site-backend
backend site-backend
balance leastconn
server site sub.example.com:80 resolvers dns check inter 1000
Ah ok, that makes sense.
I confirm that DNS resolution as you configured should fix the issue, although in this configuration you make a DNS request every second. Consider using 10 or 30 seconds for the hold valid
timeout.