400 Bad request

Hello,

I have an issue i can’t achieve to fix.

My proxy take request with a custom header to authenticate users. For reasons, some requests are rejected by the proxy with a 400 error. a show errors on the socket let me think that custom header is causing the error (i have replace sensitive data) :

 # echo "show errors" | socat /var/lib/haproxy/stats stdio
Total events captured on [02/Nov/2018:11:59:39.817] : 35
 
[02/Nov/2018:11:57:54.645] frontend http (#4): invalid request
  backend <NONE> (#-1), server <NONE> (#-1), event #34
  src xx.xx.xx.xx:xxxx, session #8587, session flags 0x00000080
  HTTP msg state MSG_RQMETH(2), msg flags 0x00000000, tx flags 0x84000000
  HTTP chunk len 0 bytes, HTTP body len 0 bytes
  buffer flags 0x00808002, out 0 bytes, total 25 bytes
  pending 25 bytes, wrapping at 16384, error at position 8:
 
  00000  MSP-CLID: xxxxxxxxxxxxxx\r\n
  00023  \r\n

One thing i can’t explain, custom header is X-MSP-CLID, but the error show MSP-CLID. Is there a good reason ?
A network capture show full header as intended.

Most of the requests are well forwarded, only some are rejected

If i set option accept-invalid-http-request the problem persists

Can someone help me to sort out this issue ?

What reaches haproxy here is not HTTP, so haproxy cannot do anything with it.

I assume your client is buggy and for a small percentage of requests emits bogus request data. If you take a look at the network capture, you need to find the exact request that haproxy rejected.

From what part of the error did you know it’s not http request ?

From the network capture, if i take a request and replay it, everything is fine and i get a OK 200

I get it because there is no HTTP request in there at all. It TCP payload begins with MSP-CLID and that is not HTTP.

If you got a 200 OK response in your capture, it means you are looking in the wrong place, because haproxy will emit a 400 Bad request here.

Find the 400 Bad request response in your capture data and then look at the corresponding request.

The strange part is that it seems the proxy forward the request as i can see a response in the capture, but with both return code 200 and 400 :

GET /xx/register HTTP/1.1
Host: xxxxxx
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Linux; Android 8.0.0; SM-A520F Build/R16NW; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/70.0.3538.80 Mobile Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Referer: http://xxx/xxx/landing?xxxx=xxxxxxx_xxxxxx_xxxxx
Accept-Encoding: gzip, deflate
Accept-Language: fr-FR,en-US;q=0.9
X-Requested-With: com.w_6847842
X-MSP-CLID: xxxxxxxxxxxx

HTTP/1.1 200 OK
Date: Fri, 02 Nov 2018 10:19:48 GMT
Server: Apache
X-Frame-Options: SAMEORIGIN
X-Powered-By: PHP/7.0.19
Cache-Control: no-cache, max-age=604800
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Expires: Fri, 09 Nov 2018 10:19:48 GMT
Content-Length: 13525
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
<html>
[...]
</html>
HTTP/1.0 400 Bad request
Cache-Control: no-cache
Connection: close
Content-Type: text/html

<html><body><h1>400 Bad request</h1>
Your browser sent an invalid request.
</body></html>

I don’t have enough informations here to work with.

Please provide your configuration along with the output of haproxy -vv. Also, I’m not sure I can help you without seeing that capture. Would you send it to me privately (I can provide you with a link to upload the capture my server)?

I understand,

Here is a purged conf. Let me know if you need something else

global
    log         /log  local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    hard-stop-after 60s

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats level admin

defaults
    mode http
    log global
    balance leastconn
    option httplog
    option dontlognull
    option redispatch
    option http-buffer-request
    log-format %ci:%cp\ [%trl]\ %f\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ST\ %B\ %CC\ \ %CS\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\ %{+Q}r
    retries                 3
    timeout http-request 10s
    timeout queue 1m
    timeout connect 10s
    timeout client 1m
    timeout server 1m
    timeout http-keep-alive 10s
    timeout check 3s
    maxconn 3000


frontend http
    bind *:80 
    stick-table type string size 1m expire 5m store gpc0
    tcp-request inspect-delay 10s
    capture cookie PHPSESSID len 30
    declare capture request len 40000
    http-request capture req.body id 0
    http-request set-log-level silent if { path_sub -i media/cache }
    capture request header Host len 100
    capture request header Referer len 100
    acl host_mydomain hdr_sub(host) -i mydomain
    use_backend mydomain_cluster if host_mydomain



backend product
    option httpchk GET /httpchk HTTP/1.1\r\nHost:httpchk.mydomain.net
    http-check expect status 200
    http-check disable-on-404
    server web01 192.168.1.59:80 check
    server web02 192.168.1.58:80 check
    server web03 192.168.1.55:80 check
    server web04 192.168.1.54:80 check
    server web05 192.168.1.46:80 check
    server web06 192.168.1.42:80 check
    server web07 192.168.1.82:80 check
    server web08 192.168.1.83:80 check


backend mydomain_cluster
    option httpchk GET /httpchk HTTP/1.1\r\nHost:httpchk.mydomain.net
    http-check expect status 200
    http-check disable-on-404
    tcp-request inspect-delay 10s
    cookie PHPSESSID prefix nocache
    server product_web02 192.168.1.58:80 track product/web02
    server product_web03 192.168.1.55:80 track product/web03
    server product_web04 192.168.1.54:80 track product/web04 backup
    server product_web05 192.168.1.46:80 track product/web05
    server product_web06 192.168.1.42:80 track product/web06 backup

I just sent you a capture.

Where did capture this? From the haproxy instance, or at the client?

Also please provide haproxy -vv output and OS/kernel informations.

Oops, my bad, i forgot the haproxy -vv :grin: I’m using centos 7.
Capture was made directly on the server with ngrep.

HA-Proxy version 1.8.13 2018/07/30
Copyright 2000-2018 Willy Tarreau <willy@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -fno-strict-overflow -Wno-unused-label
  OPTIONS = USE_LINUX_TPROXY=1 USE_CRYPT_H=1 USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
	[SPOE] spoe
	[COMP] compression
	[TRACE] trace

Well there is something wrong with this capture, as it contains ACK of segments that are not seen (tcp.analysis.ack_lost_segment).

Please retry capturing it with tcpdump with bigger buffers, something like tcpdump -nps0 -B 10240 -w tcpdumpcapture.cap tcp port 80

Hi,

I just sent you a capture with the recommended parameters. Let me know if you need anything else.
For information, our clients mostly come from mobile network. Is it possible this issue beeing due to bad/slow connection ?

thanks for your help !

The same issue in this capture as well: ACKing unseen packages. That doesn’t make a lot of sense.

If we ignore the unseen ACK problem for now, I can see what appears to be partial retransmissioning happening which both wireshark as well as your TCP stack interprets as partial retransmission:

Because of that, your kernel delivers MSP_CLID: 21XXXX randomly to haproxy.

I did not have the time to properly investigate what happens at TCP level, I can only see that both wireshark (using the Follow: TCP stream feature) and your TCP stack considers this as a partial retransmission.

Are there any strange configuration in your side: a patched OS/kernel, a specific NIC driver, anything non-standard on your end of the setup? If not, I guess it’s possible that mobile operatores intercept port 80 traffic and do strange things with it.

I have heard from people that HTTP “accelerators” in mobile networks messed their websitesm to the point that they where unusable.

I’d suggest you check whether this happens with one specific network operator only.

Upgrading your website to HTTPS is something that definitely would fix this problem, as TLS guarantees that traffic is not corrupted along the way. Whether the root cause is a accelerator in a mobile network or a TCP/NIC driver bug on your end is something that I can not exactly tell at this point.

Thank tou very much for your help.

No special setting for kernel, we are running centos 7 vm in KVM.

The the X-MSP-CLID header is added by the mobile operator to identify his customers. For this reason, we can’t use SSL …

I will ask our partner to check on their side.

Thank you again for your help :slight_smile:

Ok, in that case you will have to explain them the problem. Show them the wireshark screenshot that shows that the header is randomly inserted sometimes.