We recently upgraded from 1.7.10 to 1.7.11 and we started to notice some sporadic issues in some of the responses in our application. Specifically some searches via Solr started to fail with what looks to be somewhat garbled responses.
We were at a loss at what was causing this until we took HAProxy out of the equation and issue immediately disappeared. Introducing it back in and then switching off compression also sorted the issue.
So I was just wondering if there was any changes to the compression behavior in 1.7.11? We just use gzip compression and not changed anything on our end. Until this is sorted we’ll revert back to 1.7.10.
Hey @lukastribus. Here’s the outputs and config:
1.7.11:
HA-Proxy version 1.7.11 2018/04/30
Copyright 2000-2018 Willy Tarreau <willy@haproxy.org>
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -fno-strict-overflow
OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available filters :
[COMP] compression
[TRACE] trace
[SPOE] spoe
1.7.10
HA-Proxy version 1.7.10-a7dcc3b 2018/01/02
Copyright 2000-2018 Willy Tarreau <willy@haproxy.org>
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv
OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available filters :
[COMP] compression
[TRACE] trace
[SPOE] spoe
Sanitized Config:
global
log 127.0.0.1 local2 info
maxconn 2000
tune.ssl.default-dh-param 2048
tune.maxrewrite 16384
tune.bufsize 32768
user admin
group admin
stats socket /etc/haproxy/haproxysock level admin
defaults
log global
mode http
compression algo gzip
compression type text/html text/plain text/javascript application/javascript application/xml text/css image/png font/woff font/woff2
default-server inter 10s init-addr last,libc,none resolve-prefer ipv4
option httplog
option dontlognull
option redispatch
option forceclose
option forwardfor
retries 3
timeout connect 5000
timeout client 160000
timeout server 160000
resolvers serverdns
nameserver dnsmasq 127.0.0.1:53
frontend em
bind *:80
bind *:443 ssl crt /etc/haproxy/server.lab.local.pem
acl https ssl_fc
reqadd X-Forwarded-Proto:\ https if https
redirect scheme https if !https
default_backend server_backend
backend server_backend
mode http
cookie EXTERNALSERVERID insert indirect nocache
option httpchk /health
http-request set-header X-Forwarded-Port %[dst_port]
http-request set-header X-Forwarded-Proto https if { ssl_fc }
server server1 server.lab.local:8280 cookie server1 check resolvers serverdns ssl ca-file /etc/haproxy/chain.crt
listen stats
bind *:1936
mode http
balance
timeout client 160000
timeout connect 5000
timeout server 160000
stats uri /
stats realm HAProxy\ Statistics
stats admin if TRUE
Yeah this is a really weird one. This was the error we were seeing:
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://server.localdomain.local:443/solr/knowledge: parsing error
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:553)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:240)
And an example of the offending response was as follows:
2018-05-25 10:01:36,994 TRACE << ")PUBLISHED[0xe0]1published_date_dt[0x9][0x0][0x0][0x1]c[0x94]][0xa5][0xf3][0xff][0xc]3UserED.80028_666_-1[0xff][0xf][0x81]?[0xa7][0x1]Our rights are extended and limited by the tools we use. The Internet has magnified our capability for free speech, but has pared down the reasonable expectation of privacy. And we, of course, have [0xff][0x10]#1.0[0xff][0x11](Authored[0xff][0x12]?[\r]0qWq+bj0RHU7ETE/JUD0AuhmHKeMgedfd2lhkgWw7OE=[0xff][0x13][0x8]@E|[0x91][0xa2][0xff][0x1]6iflkdlywww9p8etfi24wk8[0xff][0x2][0xc][0x83]a`[0x8]@=[0xe2]o[0x81][0xb][0xb3][0xff][0x3]6IFlKDlYWww9p8EtFI24wK8[0xff][0x6])KM1000114[0xff][0x7]%en-GB[0xff][0x5]2KnowledgeArticleED[0xff][0xe][0x81]/content_article[0xff][0x9][0x9][0x0][0x0][0x1]c[0x94]a[0x15][0x1b][0xff][0x4]6IFlKDlYWww9p8EtFI24wK8[0xff][0x8]"en[0xff][\r]?.TPG Buys TSL, Owner Of The TES Connec"
2018-05-25 10:01:36,994 TRACE << "t Education Network, For Around $600M458[0xff][0x14][0x83],product_ipod.topic_extended+region_east[0xff][0xb]3UserED.80028_666_-1[0xff][\n]"
From looking at the above my thinking was that the response was coming back with new lines somehow caused by HAProxy 1.7.11 compression, hence the 2 separate log messages when that response line should’ve been all one message (notice how the last t of the string Owner Of The TES Connect appears at the beginning of the next message). The full response was like the above but over several more lines.
As for a recreation I wasn’t expecting others to do so as this involves my company’s software and a Solr search backend that returns rich text articles along with metadata. When I turned the gzip compression off the issue went away and responses were returned all on one line. So I was more looking for some more understanding of the compression algorithum in 1.7.11, specifically if there had been any changes.
As before, this works fine on 1.7.10 and it only occured for certain articles.
I’ll look to try the latest snapshot when I can althoughI’m not sure when I can. Maybe in the next week.
I did indeed try the latest 1.8 release at the time and it also didn’t have the issue. However with the 1.8.14 release coming soon (and that other issue fixed) I’ll likely be switching to that once it hit linux repos.
Leave it with me to try with the latest snapshot though, assuming that needs to be built from source.
Yes, if you can, please try latest 1.7 snapshot. I would like to know if this is something that still needs to be fixed, or if it already is fixed. Both 1.6 and 1.7 are still fully supported, so if there is a unfixed bug in there, I’d like to know, confirm and fix it.
Apologies for the delay but I had a try of this today and couldn’t get the issue to occur on either snapshot or 1.7.11 weirdly. Our environments have gone through a lot of changes since it happened back in May though so I’ll need to check what the differences are and get back to you.
Finally been able to replicate the environments we had before and installed the given latest version (shown below) and I can confirm the issue does still occur with our product.
Thanks, could you try the following patch (applying it to 1.7.11 or the snapshot - either way is fine):
https://pastebin.com/dl/zJvRs7sJ
It reverts commit 6917b734 (“BUG/MEDIUM: stream-int: Don’t loss write’s notifs when a stream is woken up”), which has been backported to 1.7.11 and I think it could be the one (there are not that many changes in 1.7.11, especially those with impact on compression).
Hey. I applied that patches changes to the haproxy-ss-20180901 tarball and built it but sadly it didn’t remove the problem. Solr still complaining about parsing errors.
Annoying, was hoping it would’ve been the one. So weird, especially since 1.8 is totally fine.
I think it would be best to use git bisect on your side here, we would be able to pinpoint the exact commit easily, but it does requires you spend some time in a few compilation/test cycles.
You’d simply clone the 1.7 repository, start a bisect, mark 1.7.10 good and 1.7.11 bad and then recompile from the suggestions given by git:
git clone http://git.haproxy.org/git/haproxy-1.7.git/
cd haproxy-1.7
git bisect start
git bisect good v1.7.10
git bisect bad v1.7.11
recompile haproxy and test (make clean before your standard make call): make clean; make TARGET=[...]
test it, if you hit the bug either call: git bisect bad
or if it worked fine: git bisect good
This will iterate through the changes between 1.7.10 and 1.7.11 and in the end would give your a bad commit, which is the exact change that caused the issue.
Don’t forget to make clean before each call to make (even when you just applied a single patch).
8066ccd3984adb3ceaf63a4d80047f3da4511f13 is the first bad commit
commit 8066ccd3984adb3ceaf63a4d80047f3da4511f13
Author: Christopher Faulet <cfaulet@haproxy.com>
Date: Fri Feb 2 15:54:15 2018 +0100
BUG/MEDIUM: http: Switch the HTTP response in tunnel mode as earlier as possible
When the body length is undefined (no Content-Length or Transfer-Encoding
headers), The reponse remains in ending mode, waiting the request is done. So,
most of time this is not a problem because the resquest is done before the
response. But when a client sends data to a server that replies without waiting
all the data, it is really not desirable to wait the end of the request to
finish the response.
This bug was introduced when the tunneling of the request and the reponse was
refactored, in commit 4be980391 ("MINOR: http: Switch requests/responses in
TUNNEL mode only by checking txn flag").
This patch should be backported in 1.8 and 1.7.
(cherry picked from commit fd04fcf5edb0a24cd29ce8f4d4dc2aa3a0e2e82c)
Signed-off-by: Willy Tarreau <w@1wt.eu>
(cherry picked from commit 8a5949f2d74c3a3a6c6da25449992c312b183ef3)
Signed-off-by: Christopher Faulet <cfaulet@haproxy.com>
:040000 040000 1d1425ab79b4f083ef563601d95f56f46d90513e deab1b91af05135cfee186bbb0a22acbf1c2d886 M src
@Ryuzaki, could you provide a network capture of a bad response between HAProxy and your server and between the client and HAPRoxy ? I’m a bit puzzled by the commit where the bug seems to be. Because it concerns responses with undefined body length. But for these responses, there is no compression.
It is weird I agree. I assumed it was compression as the issue did go away when I switched that config off when first investigating a few months ago. I’ll double check today if that did indeed fix the problem or if I was chasing a red herring.
A network capture may take some time. This is a company product so I’ll need to check if I’m always to share a network capture.