Serious Performance Issue with ACL and URL Rewriting

We have been running a version 1.6 HAPROXY based load balancing system for the past 12 months.
We were processing around 40K requests per day for the first 6 months then traffic started to increase, to 80K, 160K, 300K requests per day. We have been handling around a million request a day for the past 3 months without any trouble on the HAPROXY side of things. We updated the back end and changed the base path of the URL from /whatever/etc to /v2/whatever/etc and decided to use HAPROXY to do the URL rewrite on the fly allowing existing configurations to continue working with the new URL. This is when the trouble started. We were processing backend request in about 400ms but since the URL rewrite was added the performance has dropped to 1 request per 4 seconds at best. If we disactivate the URL rewriting, performance returns to 400ms so it is clear that the ACL/URL rewriting is the cause.
My questions are very simple. Does anyone know anything about this ? Would an upgrade to 1.7 or 1.8 change things ?

That is strange. Please provide the output of haproxy -vv and the configuration.

I am not able to do so unfortunately because I cannot take any data from the site, it is a very sensitive operation. I have isolated it down to the collection of ACLs ( acl name path_beg /whatever) which detect the paths to be rewritten and the http-request rewrite expressions.
Without them the performance is good.
With them the performance is seriously bad.
I was just wondering if anyone had ever experienced something similar.

I will not be onsite until tomorrow morning and do not have remote access. I will do what I can when I get there to provide you with further information.

Well provide as much configuration as you can then. And don’t forget the output of haproxy -vv.

No, this is not normal at all, rewriting should be very cheap.

Yes that’s what I thought, I was astonished to find out what the performance issue was down to. We have one front end dispatching by URL PATH to two different backends, One is feeding a group of three Apache / Wsgi / Python servers, the other backend is feeding a group of three Java / Tomcat applications. The Apache servers communicate with their own Java Tomcat, the tomcats all feed into four LDAP backend storage servers. We had been experiencing a lot of error 504 incidents which were down to a serious performance issue in the back end between the Tomcats and the primary LDAP server. The bug in the TomCat application was corrected and we were in good shape to take on load to reach our current state of 1 million requests per day, every day. All the ACL and URL rewrites are in the single front end declaration and correct the old URL schemes before the use_backend switching block.

I hesitate to add that this has become classified as a ‘critical’ production incident and fortunately we discovered the work around on Friday evening. because a rollback is out of the question seeing how the data image has been extensively modified by the avalanche of customer requests since the activation on production. User Application Testing was unable to simulate the real production workload and so the production issue was not detected until now.

Good morning, here is the output from the “haproxy -vv” commmand:

HA-Proxy version 1.6.9 2016/08/30
Copyright 2000-2016 Willy Tarreau willy@haproxy.org

Build options:
TARGET = linux26
CPU = generic
CC = gcc
CFLAGS = USE_ZLIB=1 USE_OPENSSL=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3) : yes
Built with zlib version : 1.2.3
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built without PCRE support (using libc’s regex instead)
Built without Lua support
Built with transparent proxy support using IP_TRANSPARENT IP_FREEBIND

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll

There you are, I had to copy it all out by hand because I have no copy paste from the machine to here.

I need the relevant configuration, because I still have no clue what your rewrite looked like exactly, and whether you used regular expression or other features in it.

From the output provided you did not enable PCRE, nor the linux2628 target, which I assume would have been appropriate for your OS.

I was not part of the team that built this version of HAPROXY, I caught this plane “in flight” so to speak.

We have a series of ACL/HTTP-REQUEST blocks such as :

acl p_some_name path_beg /old/path
http-request set-path %[path,regsub(old/path,new/path,g)] if p_some_name

there are 10 of them in all, involved in the transformation of the old version of the API, on the fly

these are preceeded by 25 acls dedicated to API endpoint filter which are processed by :

http-request allow if one of the defined acl expressions
http-request deny

followed by a simple

use_backend backend-one if { path_beg /path/one }
default_backend backend-two

again I am sorry I cant expose the exact configuration, for one i cant copy paste it to this machine, and two it contains sensitive naming tokens that would violate the company security policy if exposed.

Sincerely
Jamie Marshall

I noticed I skipped a line in the haproxy -vv output

“CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
OPTIONS = USE_ZLIB=1 USE_OPENSSL=1”

J

You are using regular expression without PCRE. I assume that’s why the performance is a nightmare. Can you provide OS/kernel/libc release please?

1 Like

Centos: 3.10.0-693.11.6.e17.x86_64 GNU / Linux

That would be CentOs 7.4. To check libc regex performance compared to pcre when using a lot of those:

acl p_some_name path_beg /old/path
http-request set-path %[path,regsub(old/path,new/path,g)] if p_some_name

That doesn’t make sense. This kernel is from CentOs 7.4, but openssl (1.0.1e) and zlib (1.2.3) packages are from CentOs 6.

How is it that you run a CentOs 7 kernel on a CentOs 6 system? What’s the libc here? What does haproxy link to?

There is probably some other factor at play. Even if PCRE is not used, the rewrite would probably still fast enough.

The fact is that we know not even a tenth of the configuration, don’t have any logs, the build is pretty much suboptimal (no PCRE) and we won’t know what happened to this OS (and libc) - likely it has been half upgraded from 6 to 7.

While I could try to reproduce it, there is no point in it doing so when it is unclear what the base OS looks like.

I understand your issue about not being able to understand. I am an onsite contractor to a large corperate and this is the issue i have encountered. I have rolled back the production system to the previous supposed stable state to get us some time to investigate it further. yes there is a big mix up, it looks like some guys here pretend to know what they are doing. I am bound by law not to expose further details unfortunately. Thank you very much for your help.

I understand that. You will have to reproduce this in a lab environment without any confidentiality issues then.

Use regsub(^old/path,new/path,) in your set-path