My proxy take request with a custom header to authenticate users. For reasons, some requests are rejected by the proxy with a 400 error. a show errors on the socket let me think that custom header is causing the error (i have replace sensitive data) :
# echo "show errors" | socat /var/lib/haproxy/stats stdio
Total events captured on [02/Nov/2018:11:59:39.817] : 35
[02/Nov/2018:11:57:54.645] frontend http (#4): invalid request
backend <NONE> (#-1), server <NONE> (#-1), event #34
src xx.xx.xx.xx:xxxx, session #8587, session flags 0x00000080
HTTP msg state MSG_RQMETH(2), msg flags 0x00000000, tx flags 0x84000000
HTTP chunk len 0 bytes, HTTP body len 0 bytes
buffer flags 0x00808002, out 0 bytes, total 25 bytes
pending 25 bytes, wrapping at 16384, error at position 8:
00000 MSP-CLID: xxxxxxxxxxxxxx\r\n
00023 \r\n
One thing i can’t explain, custom header is X-MSP-CLID, but the error show MSP-CLID. Is there a good reason ?
A network capture show full header as intended.
Most of the requests are well forwarded, only some are rejected
If i set option accept-invalid-http-request the problem persists
What reaches haproxy here is not HTTP, so haproxy cannot do anything with it.
I assume your client is buggy and for a small percentage of requests emits bogus request data. If you take a look at the network capture, you need to find the exact request that haproxy rejected.
I don’t have enough informations here to work with.
Please provide your configuration along with the output of haproxy -vv. Also, I’m not sure I can help you without seeing that capture. Would you send it to me privately (I can provide you with a link to upload the capture my server)?
Oops, my bad, i forgot the haproxy -vv I’m using centos 7.
Capture was made directly on the server with ngrep.
HA-Proxy version 1.8.13 2018/07/30
Copyright 2000-2018 Willy Tarreau <willy@haproxy.org>
Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -fno-strict-overflow -Wno-unused-label
OPTIONS = USE_LINUX_TPROXY=1 USE_CRYPT_H=1 USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1
Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace
I just sent you a capture with the recommended parameters. Let me know if you need anything else.
For information, our clients mostly come from mobile network. Is it possible this issue beeing due to bad/slow connection ?
The same issue in this capture as well: ACKing unseen packages. That doesn’t make a lot of sense.
If we ignore the unseen ACK problem for now, I can see what appears to be partial retransmissioning happening which both wireshark as well as your TCP stack interprets as partial retransmission:
Because of that, your kernel delivers MSP_CLID: 21XXXX randomly to haproxy.
I did not have the time to properly investigate what happens at TCP level, I can only see that both wireshark (using the Follow: TCP stream feature) and your TCP stack considers this as a partial retransmission.
Are there any strange configuration in your side: a patched OS/kernel, a specific NIC driver, anything non-standard on your end of the setup? If not, I guess it’s possible that mobile operatores intercept port 80 traffic and do strange things with it.
I have heard from people that HTTP “accelerators” in mobile networks messed their websitesm to the point that they where unusable.
I’d suggest you check whether this happens with one specific network operator only.
Upgrading your website to HTTPS is something that definitely would fix this problem, as TLS guarantees that traffic is not corrupted along the way. Whether the root cause is a accelerator in a mobile network or a TCP/NIC driver bug on your end is something that I can not exactly tell at this point.
Ok, in that case you will have to explain them the problem. Show them the wireshark screenshot that shows that the header is randomly inserted sometimes.