Random 502 Bad Gateway errors


#1

Hi All,

I’m new to HAProxy and I’m trying to use it as load balancer for a couple of IIS 10 web servers. My setup is simple:

global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon

# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private

# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
#  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3

defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http

frontend www-http
bind 10.50.1.40:80

default_backend www-backend

frontend www-https
bind 10.50.1.40:443 ssl crt /etc/haproxy/sslcert.pem
reqadd X-Forwarded-Proto:\ https
#http-response set-header Strict-Transport-Security max-age=31536000;\ includeSubdomains;\ preload
default_backend www-backend

backend www-backend
mode http
balance roundrobin

cookie SERVERID insert indirect nocache
redirect scheme https if !{ ssl_fc }
server web01 10.50.1.30:80 check inter 10s fall 3 rise 2 cookie s1
server web02 10.50.1.31:80 check inter 10s fall 3 rise 2 cookie s2

The log file is showing the below error:

Nov 9 07:09:01 localhost haproxy[74752]: 11.66.82.20:28835 [09/Nov/2017:07:08:59.423] www-https~ www-backend/web01 0/0/0/-1/1753 502 12493 - - PHVN 1/1/0/0/0 0/0 “GET /Contacts/Default.aspx HTTP/1.1”

I ran this command right after I got the error:
sudo echo “show errors” | sudo socat stdio unix-connect:/run/haproxy/admin.sock

The result is rather large: (I have pasted the first page below)

Total events captured on [09/Nov/2017:10:41:22.471] : 75

[09/Nov/2017:10:41:19.791] frontend www-http (#2): invalid request
backend (#-1), server (#-1), event #74
src 104.215.91.84:2881, session #187, session flags 0x00000080
HTTP msg state 26, msg flags 0x00000000, tx flags 0x00000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00908002, out 0 bytes, total 186 bytes
pending 186 bytes, wrapping at 16392, error at position 0:

00000 \x16\x03\x03\x00\xB5\x01\x00\x00\xB1\x03\x03Z\x040\xCF\xD6TH\.@\xCD
00022+ \xF9;\xE6\x95\xB4h\xC1\xF5\xF4\xD6\xA1l|O-~aBGp\xE8\x00\x00"\xC0,\xC0+
00050+ \xC0$\xC0#\xC0(\xC0’\xC0\n
00060 \xC0\t\xC0\x14\xC0\x13\x00\x9D\x00\x9C\x00=\x00<\x005\x00/\x00\n
00080 \x01\x00\x00f\x00\x00\x00-\x00+\x00\x00(xxxxxx.xxxxxxxxxx.cloudapp
00123+ .azure.com\x00\n
00135 \x00\x06\x00\x04\x00\x18\x00\x17\x00\x0B\x00\x02\x01\x00\x00\r\x00\x14
00153+ \x00\x12\x06\x01\x06\x03\x04\x01\x05\x01\x02\x01\x04\x03\x05\x03\x02
00170+ \x03\x02\x02\x00#\x00\x00\x00\x17\x00\x00\xFF\x01\x00\x01\x00

[09/Nov/2017:10:39:21.824] backend www-backend (#4): invalid response
frontend www-https (#3), server webaue01 (#1), event #42
src 11.66.82.20:32140, session #150, session flags 0x002004cf
HTTP msg state 26, msg flags 0x00000000, tx flags 0xa8000060
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x80048002, out 0 bytes, total 15368 bytes
pending 15368 bytes, wrapping at 16392, error at position 0:

00000 \x124Vx\x9A\xBC\x00\r:\xD1H\\x08\x00E\x00\xE3\xE9\x02$@\x00\x80\x06
00024+ \x00\x00\n
00027 \xC8\x01\x1E\n
00031 \xC8\x01(\x00P\xA4\xD0\xDA\x82y\x7F\x08\xDE;\xDC\x80\x18\x04\x03\x17
00051+ \xDC\x00\x00\x01\x01\x08\n
00058 \x06’U\xF1\xDD\x8C\xF3XHTTP/1.1 200 OK\r\n
00083 Cache-Control: private\r\n
00107 Content-Type: text/html; charset=utf-8\r\n
00147 Content-Encoding: gzip\r\n
00171 Vary: Accept-Encoding\r\n
00194 Server: Microsoft-IIS/10.0\r\n
00222 X-AspNet-Version: 4.0.30319\r\n
00251 Date: Thu, 09 Nov 2017 10:39:22 GMT\r\n
00288 Content-Length: 58046\r\n
00311 \r\n
00313 \x1F\x8B\x08\x00\x00\x00\x00\x00\x04\x00\xED\xBD\x07\x1CI\x96%&/m\xCA 00335+ {\x7FJ\xF5J\xD7\xE0t\xA1\x08\x80\x13$\xD8\x90@\x10\xEC\xC1\x88\xCD
00357+ \xE6\x92\xEC\x1DiG#)\xAB*\x81\xCAeVe]f\x16@\xCC\xED\x9D\xBC\xF7\xDE{
00383+ \xEF\xBD\xF7\xDE{\xEF\xBD\xF7\xBA;\x9DN’\xF7\xDF\xFF?\fd\x01l\xF6\xCE
00407+ J\xDA\xC9\x9E!\x80\xAA\xC8\x1F?~|\x1F?"~\xE3\xE4\xF1\xEF\xFA\xF4\xCB
00430+ \x937\xBF\xCF\xCB\xD3t\xDE.\xCA#\xFA\xC0\xFE\xCC\xB3YZ\xCC>\xFBh\xDA
00453+ \x96;;\xBF\xFF\xB7\xE9\xCF\xDD\x8F\x8E\x1E/\xF26K\xE7m\xBB\xDA\xCE\x7F
00475+ \xD1\xBA\xB8\xFC\xEC\xA3\x93j\xD9\xE6\xCBv\xFB\xCD\xF5*\xFF(\x9D\xCA_
00496+ \x9F}\xD4\xE6\xEF\xDA\xBB\x80u\x98N\xE7Y\xDD\xE4\xEDg\xEB\xF6|\xFB\xE0
00518+ \xA3\xF4n\x04\xC8\xEF\xBD\xFD\xD5\xF1\xF6I\xB5Xem1)}8g\xA7\x9F\xE5\xB3
00543+ \x0B\xFA\x84^k\x8B\xB6\xCC\t\xB7\x1F\xFB\x8D\x93\xD4{^\xAF\xEB\x8B,=
00565+ \xA1\xE6uV\xBA\xAFh\x0Cw\xE5\x8D\xC7e\xB1|\x9B\xD6y\xF9\xD9GM{]\xE6
00590+ \xCD<\xCF\xDB\x8F\xD2\x960VD\xA7M\xF3Q:\xAF\xF3\xF3\xCF>\xBA{\xF2\xFA
00614+ \xF5\xDDIU\xB5\r\x81[\x8D\x17\xC5rL\xDF\xFE\x1E\x97\x9F\xFD^?Xe\xCF
00638+ \xAA\xCB/\xF3\xF3O\x1F\xDC\x9B\xDF\xFF\xBD\xA7\xAF\xAF~\x11\xA3\xF5
00657+ \xF5\xA1o\xB7\xF3|\x91\x7F\xE3}\x94\xC5\xE4n[e\xD4\x85\xFE\xF8\xC6{
00679+ \xC0(\x98\xECwgY\x9B\xB7\xC5"\x15\xD3\xB7y\xFD\xB3\xD4I\x9B]\x9C-W
00704+ \xEB\xB6\xB9\xBB\xBC\xD8\xA6?\x9A\xED\x02\x7F\x8E-!\x7F\x16\x07H\xDDE
00725+ \xFB\xFEa\xF4\xF8&k\xDE\xBE1\x7F}\xE3\xBD\x1D//\xD6eV\xDF=\xC9\xCA|9
00751+ \xA3
\xCE\xD7e9\xD5?n\xDF\xDB\xEB\xDB\xF5&c#\x19\x9FTO\xAAwc\xFE\xF3g
00778+ \xA9\x93\xCF\xEBb\xF6\xB3\xDA\xC1W\xAB\xB2\xCA~\xB6\xBAh\x00\xF5\xEE
00798+ \x17\xF9r\xFD\xBE\x1D\xDCr\xDE\x19\xEA\xDD\xD3Y\xD1V\xF5\xCFR\x17\xD3u
00820+ \xD3V\x8B\xF3"/g\xB7\x87}K\xFA\\xE5\x93UU\xB7Yy{\xC8\xB7\xC5:#\x03T\
00849+ \xFCl\xC9\xF5w\x8B\xE5\xAC\xBA\xFAY\xA28s\xCD7\x0F5\xCF\xEA\xE9\xFC\e
00872+ \x05K\xBF\xDF\x9D\xE5\xDF\xF0\xEC\xD1\xEFw\x97U[\x9C\x17S\xF2 \xAAo~
00895+ \x06\x99\xBC\xDBB\x8E\xED:o\xD6e\xFB\x8D\xF7\xC1\x8Cq\xF7\xA7\x7F\xD1:
00917+ \xAF\xAF\xB7\xD7\xC5\xCF\xA2\x85\xF9\x895Y\xEB\xD7<\x98’\xAF7\xF6\xC1

The strange thing is, the same page works 8 out of 10 times. I can reproduce the error by browsing to the page, then anther, then same page again and so on, and I will receive the 502 error.

I’m not sure what do next with this setup.

Any guidance is much appreciated.


#2

I have tried option accept-invalid-http-response, that didn’t make any difference.


#3

Seems like a completely corrupted request and response.

Maybe tcpdump request and response and see if that is really on the wire, or if there is some corruption happening within haproxy.

Can you share the output of haproxy -vv also?


#4

Thanks for your reply lukastribus

I have deployed HAProxy on multiple servers trying to diagnose this issue, they all came back with the same result.

HAProxy -vv results

HA-Proxy version 1.6.3 2015/12/25
Copyright 2000-2015 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2
OPTIONS = USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with OpenSSL version : OpenSSL 1.0.2g-fips 1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.38 2015-11-23
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Another Server

HA-Proxy version 1.7.9-1ppa1~xenial 2017/08/19
Copyright 2000-2017 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -g -O2 -fPIE -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_NS=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
Running on OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.38 2015-11-23
Running on PCRE version : 8.38 2015-11-23
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with network namespace support

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[COMP] compression
[TRACE] trace
[SPOE] spoe

I have tried to disable server compression, didn’t make any difference.


#5

Actually, disabling Dynamic Compression on IIS 10 has resolved this issue… Any idea why HAProxy isn’t happy with dynamic compression?


#6

I’m not sure what your design here looks like and why the dynamic compression feature would impact the both the request and the response, but clearly the headers get completely corrupted in both directions.

Like I said, if you want more clarity, capture the packets and take a look at what happens on the wire.


#7

Unfortunately I had to drop HAProxy and use Azure Application Gateway instead due to time constraints in this project. I couldn’t find what was causing this issue. HAProxy works fine with the same application in a different Azure data center and in a Staging environment as well.
The issue has been resolved with Azure Application Gateway.
Thanks for your time Lukas.


#8

I believe that this may just mask the real problem, which could be still unfixed. At least, keep it in mind that this data center showed corruption when using haproxy.


#9

Thinking about it now, the new environment differs in one area; Azure Accelerated Networking… It is enabled on the new environment’s Web Servers, but not enabled on the HAProxy servers (because they have 4 vCPUs each). The traffic passes through the virtual switch for HAProxy, and the Web Servers are connected directly to the physical switch via SR-IOV.

I will see if I have enough time to deploy HAProxy with accelerated networking and see if it makes any difference. I might even try a different Linus distro.

The QA team has been hitting the new environment, not a single issue until now.


#10

Well, according to Microsoft:

Accelerated Networking is GA for Windows and in a Public Preview for specific Linux distributions

So that may explain why we see bogus traffic in Haproxy/Linux.


#11

Yes, I wasn’t using Accelerated Networking with HAProxy… I thought mixing both in the same environment (Windows Server 2016 with accelerated networking and HAProxy without Accelerared Networking) could cause this issue…
I will see if I can test and will reply for the benefit of other HAProxy on Azure users…


#12

Alright, I have some good news.

The QA Team were experiencing some weird issues even with Azure Application Gateway (random slowness, JS errors).
I have deployed HAProxy on Debian, same exact result, 502 errors.

I deleted one of the IIS Web Servers and redeployed it without Accelerated Networking, and HAProxy is back in the game.
No issues at all so far, and it is blazing fast.

It seems that Azure Accelerated Networking is culprit here. I’m going to submit a support case with Azure support. I will be redeploying the whole environment again without AN.

If you have any weird issues with your web application and you are using AN, get rid of it.

Thanks again, I hope this helps other HAProxy and Azure users.