HaProxy stops working after a period of time

I’m having this strange issue where HAProxy randomly stops working for a specific URL/IP and I cannot figure out the reason. A restart of the service will resolve the issue but I need to determine a permanent fix for this rather than a workaround cronjob that restarts the service periodically.

I am proxying a request from a server to an API

When HAProxy is working my logs will look like below (URL replaced with ‘site’ and IP changed for confidentiality):

/var/log/haproxy/access.log-20200117.gz:
haproxy[26037]: 10.10.50.50.:42386 [16/Jan/2020:13:39:14.934] site-uat_site.com site-uat_com/site-uat_com 0/153/272 1104 – 1/1/0/1/0 0/0

and then at some random time, possibly after a request hasn’t been sent to this URL for a while I get the below in my logs:

/var/log/haproxy/access.log-20200118.gz:
haproxy[26037]:10.10.50.10:60657 [17/Jan/2020:12:49:26.065] site-uat_site.com site-uat_com/site-uat_com 0/-1/120008 212 sC 0/0/0/0/3 0/0

Note the 212 error code and sC termination code. I’ve tried doing some research into this:
s: the server-side time-out expired first.
C: waiting for CONNECTION to establish on the server. The server might at most have noticed a connection attempt.

I check the log on the backend of the proxy and I don’t see any traffic reaching it.

Any help or ideas would be greatly appreciated, I’m even a little stuck on how to start troubleshooting this.

Config:

frontend site-uat_site_com
           bind            10.10.50.50:6443
           mode            tcp
           log             global
           option          tcplog
           option          dontlognull
           timeout client  90s
           use_backend site-uat_com

backend site-uat_com
            mode            http
            timeout connect 30s
            timeout server  30s
            balance         roundrobin
            http-request set-header Host nonprod-site.com
            server          site-uat_com nonprod-site.com:443 ssl verify none

It’s not a 212 error code, it’s 212 bytes read from the server, when it aborted.

Your configuration is very confusing I don’t understand what you are trying to achieve.

You have a HTTP (not HTTPS) request reching port 6443 (why 6443 if it isn’t HTTPS), but you don’t use mode http here, and then you go through a backend in http mode with SSL on the server.

Can you clarify what it is that you are expecting haproxy todo?

Right thank you! I had read that it wasn’t a 212 error code but kept hearing that from my team. Changed my subject line accordingly.

We use HAProxy to be able to support TLS1.2 between a very old in house application and the API the traffic is sent out to, since this traffic goes over the Internet. I also didn’t do the initial config so I can’t speak to why they went this direction, but I do know 6443 could very well be 8086 or something to make more sense but wouldn’t change the behaviour.

However, I did restart HAProxy yesterday evening and once again by this morning the traffic has stopped reaching the API. I haven’t confirmed that the traffic isn’t leaving the server but I have confirmed it’s not reaching the API and I know if I restart HAProxy right now this issue will resolve.

Let me read more into the modes, I have no idea why the frontend is tcp, that does seem odd to me too if you’re saying it should probably be http.

Can you provide the complete configuration and the output of haproxy -vv.

I assume the reason for this are missconfigured timeouts.

Thanks for your assistance Lukas. I’ve included the requested info, I left out the unrelated frontend/backend configs.

[root@server loc]# haproxy -vv
HA-Proxy version 1.7.8 2017/07/07
Copyright 2000-2017 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -DTCP_USER_TIMEOUT=18
OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built without Lua support
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[COMP] compression
[TRACE] trace
[SPOE] spoe

Global/Default Configs:

global
            maxconn         100000
            stats socket    /var/run/haproxy.stat mode 600 level admin
            log             127.0.0.1 local2 debug
            chroot          /var/empty
            ssl-default-bind-ciphers ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:ECDHE-RSA-DES-CBC3-SHA:ECDHE-ECDSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:DES-CBC3-SHA:HIGH:SEED:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!RSAPSK:!aDH:!aECDH:!EDH-DSS-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA:!SRP
            ssl-default-bind-options  no-tls-tickets
            ssl-default-server-ciphers ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:ECDHE-RSA-DES-CBC3-SHA:ECDHE-ECDSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:DES-CBC3-SHA:HIGH:SEED:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!RSAPSK:!aDH:!aECDH:!EDH-DSS-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA:!SRP
            ssl-default-server-options  no-tls-tickets
            tune.ssl.default-dh-param 1024
            daemon

defaults
        timeout client 30s
        timeout connect 5s
        timeout server  5s

frontend dfc-uat_d_com
        bind            10.10.50.50:6443
        mode            tcp
        log             global
        option          tcplog
        option          dontlognull
        timeout client  90s
        use_backend dfc-uat_com

backend dfc-uat_com
        mode            http
        timeout connect 30s
        timeout server  30s
        balance         roundrobin
        http-request set-header Host nonprod-dfc.com

        #New PsPrint LZ URL
        server          dfc-uat_com nonprod-dfc.com:443 ssl verify none

Are you sure you have given me the correct configuration?

As per your error log, the connection timed out after 120 seconds in “timeout connect”:

haproxy[26037]:10.10.50.10:60657 [17/Jan/2020:12:49:26.065] site-uat_site.com site-uat_com/site-uat_com 0/-1/120008 212 sC 0/0/0/0/3 0/0

But none of the configuration snippets you provided matches a 120 second timeout.

I suggest to move from tcp to http mode and upgrade the logging method to http. This will give us a better picture in the log when you hit that issue.

So instead of:

       mode            tcp
       log             global
       option          tcplog

you use

       mode            http
       log             global
       option          httplog

Also, when in this situation, please checkout if you have stale sockets with netstat.

I determined that the problem is with the backend hostname:

server dfc-uat_com nonprod-dfc.com:443 ssl verify none

There are two dynamic IPs associated with nonprod-dfc.com and HAProxy requires a reload/restart to update the IP in it’s cache for this hostname.

Could anyone confirm that I could resolve this with:

resolvers dns
  nameserver public-0  xx.xx.xx.xx:53
  hold valid 1s

frontend http
  bind *:8000
  default_backend site-backend

backend site-backend
  balance leastconn
  server site sub.example.com:80 resolvers dns check inter 1000

Ah ok, that makes sense.

I confirm that DNS resolution as you configured should fix the issue, although in this configuration you make a DNS request every second. Consider using 10 or 30 seconds for the hold valid timeout.