100% CPU usage with Nginx and HAProxy


#1

I have a somewhat complicated setup with 2 instances of HAProxy running on a single server, fronted by Nginx on another server.

Even though there are two instances of haproxy running on the same server, only one of them is ever being used (via iptables rule).

       nginx
         |                    nginx server (10.0.0.1)
---------------------------------------------
         80
         |                    haproxy server (10.0.0.2)
   [ IPTABLES]
       /
      /
  HAproxy_A      HAproxy_B

This setup has worked for me for a long time without any issues. When I want to make a config change for haproxy, I will update the instance which is not being used, restart it, and update iptables, for example:

       nginx
         |                    nginx server (10.0.0.1)
---------------------------------------------
         80
         |                    haproxy server (10.0.0.2)
      [ IPTABLES]
                \
                 \
  HAproxy_A      HAproxy_B

I recently discovered that I’m able to get haproxy to hang and use 100% CPU when using the keepalive option in nginx, and the http-reuse option in haproxy. This only happens when REMOVING backend servers in haproxy and using ApacheBench to simulate load on the servers.

For example:

Nginx config routing to haproxy:

upstream myapp {
server 10.0.0.2;
keepalive 64;
}

server {
    listen 80;
    server_name mynginx;

   location / {
      proxy_pass         http://myapp$request_uri;
      proxy_http_version 1.1;
      proxy_set_header Connection "";
    }

}

Haproxy config routing to a single backend app on 10.134.8.221:

 backend backend_myapp
     balance roundrobin
     http-reuse safe
     server  10.134.8.221_31403 10.134.8.221:31403 maxconn 128  weight 100  check
  1. haproxy_a is accepting requests

  2. Add another backend servers into haproxy_b (backend app is running on same server, just different port), for example:

    backend backend_myapp
    balance roundrobin
    http-reuse safe
    server 10.134.8.221_31403 10.134.8.221:31403 maxconn 128 weight 100 check
    server 10.134.8.221_31404 10.134.8.221:31404 maxconn 128 weight 100 check

  3. Restart haproxy_b with
    /usr/sbin/haproxy_b -p /tmp/haproxy_b.pid -f /etc/haproxy/haproxy_b.cfg -sf <old haproxy_b pid>

  4. Update iptables to haproxy_b port

  5. Run ab test via nginx. Everything works as expected. Haproxy_b is being used and requests are getting routed to backend servers appropriately.

However, when removing a backend server. For example:

  1. haproxy_b is accepting requests
  2. Restart haproxy_a (even though the config hasn’t changed and only contains the one backend server) with
    /usr/sbin/haproxy_a -p /tmp/haproxy_a.pid -f /etc/haproxy/haproxy_a.cfg -sf <old haproxy_a pid>
  3. Update iptables to point to haproxy_a port
  4. Run ab test against nginx. Haproxy_b starts using 100% CPU, even though requests should only be coming to haproxy_a. Note that the CPU doesn’t spike to 100% until the requests starting coming through nginx to haproxy.

This only happens when using the keepalive option in nginx AND the http-reuse option in haproxy. If I remove either one of these options, I am unable to reproduce the issue.

Is it possible that nginx is keeping the connections alive to the inactive instance of haproxy, and then the next set of requests is coming to the old (non running) instance?

Any ideas on why this would be happening? I realize it’s a complicated setup, so would be happy to provide any details I may have forgotten.

Thanks for the help in advance.


#2

Can you share a more complete configuration of the 2 haproxy instances, particularly the defaults, global and frontend sections. The 2 instances listening on different ports I imagine?

Also can you share the following outputs:

  • uname -a
  • haproxy -vv
  • nginx -V

Also, if you can attach strace to the haproxy process that is spinning at 100% and share the output (will contain ip addresses and other private data). Something like this should do the job:
strace -tt -p <PID> -o haproxy-strace-output.txt


#3

Hello @lukastribus, thanks for the reply.

Here is the additional information you requested.

  • uname -a
Linux 4.4.0-57-generic #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • haproxy -vv
HA-Proxy version 1.6.3 2015/12/25
Copyright 2000-2015 Willy Tarreau <willy@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2
  OPTIONS = USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.2g-fips  1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g  1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.38 2015-11-23
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.
  • nginx -V
nginx version: nginx/1.11.8
built by gcc 4.9.2 (Debian 4.9.2-10)
built with OpenSSL 1.0.1t  3 May 2016
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --modules-path=/usr/lib/nginx/modules --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --http-client-body-temp-path=/var/cache/nginx/client_temp --http-proxy-temp-path=/var/cache/nginx/proxy_temp --http-fastcgi-temp-path=/var/cache/nginx/fastcgi_temp --http-uwsgi-temp-path=/var/cache/nginx/uwsgi_temp --http-scgi-temp-path=/var/cache/nginx/scgi_temp --user=nginx --group=nginx --with-compat --with-file-aio --with-threads --with-http_addition_module --with-http_auth_request_module --with-http_dav_module --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_random_index_module --with-http_realip_module --with-http_secure_link_module --with-http_slice_module --with-http_ssl_module --with-http_stub_status_module --with-http_sub_module --with-http_v2_module --with-mail --with-mail_ssl_module --with-stream --with-stream_realip_module --with-stream_ssl_module --with-stream_ssl_preread_module --with-cc-opt='-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-z,relro -Wl,-z,now -Wl,--as-needed'

Both instances of haproxy are running the same version with the same defaults and front end config (below) with the exception of the bind ports - they are running on different ports (8550/8551 and 8660/8661).

global
    daemon
    maxconn 16384
    log 127.0.0.1 local0
    log 127.0.0.1 local1 notice
    log-send-hostname

defaults
    mode http
    log global
    unique-id-format %{+X}o\ %ci:%cp_%fi:%fp_%Ts_%rt:%pid
    unique-id-header X-Unique-ID
    timeout connect   5s
    timeout client   60s
    timeout server   60s
    timeout tunnel 3600s
    option dontlognull
    option http-keep-alive
    option redispatch
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

listen stats
    bind 10.129.20.97:8551
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /

frontend http-in
    bind 10.129.20.97:8550
    option httplog
    option forwardfor
    log-format %ci:%cp\ [%t]\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ST\ %B\ %CC\ %CS\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\ %{+Q}r\ %ID


    acl acl_myapp hdr(host) -i myapp.service.consul
    use_backend backend_myapp if acl_myapp

backend backend_myapp
    http-reuse safe
    balance roundrobin
    server 10.129.8.221_31403 10.129.8.221:31403 maxconn 128 weight 100 check

I managed to capture 2 straces - one as the process was climbing to 100% CPU, and another (smaller one) where the CPU was already running 100%:

Let me know if you’d like any additional information. Thanks for the time.


#4

Please upgrade to the latest release 1.6 release (1.6.11 at this time).

Your release is from December 2015 and contains multiple bugs related to this exact symptom (spinning at 100% CPU load), which have been fixed.


#5

Hi @lukastribus:

Version 1.6.11 seems to have fixed it! I’ll do some more testing, just to be sure.

Thanks for the help!!