Invalid requests

Hi all,

Receiving a few bad requests. It seems like they are coming from Facebook’s crawler, but I’m not sure if it is my issue or there side.

Here is an example of “show errors” command using socat:

[15/Oct/2018:20:43:12.275] frontend mysite (#2): invalid request
  backend mysite (#2), server <NONE> (#-1), event #368
  src 66.220.149.10:40984, session #28221, session flags 0x00000080
  HTTP msg state MSG_RQBEFORE(0), msg flags 0x00000000, tx flags 0x00000000
  HTTP chunk len 0 bytes, HTTP body len 0 bytes
  buffer flags 0x00808002, out 0 bytes, total 517 bytes
  pending 517 bytes, wrapping at 32768, error at position 0:

  00000  \x16\x03\x01\x02\x00\x01\x00\x01\xFC\x03\x03N\xB3Q\x14\xDC\xADz\xF3
  00019+ \x85\xAC\x8E\xCE!\xAC\xFA;\xC4\x1Dv\xF4\x86\x04\xFB\xDC\x88*\x885\xCB
  00040+ \xB7m>\x00\x00\xAA\xC00\xC0,\xC0(\xC0$\xC0\x14\xC0\n
  00058  \x00\xA5\x00\xA3\x00\xA1\x00\x9F\x00k\x00j\x00i\x00h\x009\x008\x007
  00080+ \x006\xCC\xA9\xCC\xA8\xCC\x14\xCC\x13\xCC\xAA\xCC\x15\x00\x88\x00\x87
  00098+ \x00\x86\x00\x85\xC02\xC0.\xC0*\xC0&\xC0\x0F\xC0\x05\x00\x9D\x00=\x005
  00120+ \x00\x84\xC0/\xC0+\xC0'\xC0#\xC0\x13\xC0\t\x00\xA4\x00\xA2\x00\xA0\x00
  00141+ \x9E\x00g\x00@\x00?\x00>\x003\x002\x001\x000\x00\x9A\x00\x99\x00\x98
  00164+ \x00\x97\x00E\x00D\x00C\x00B\xC01\xC0-\xC0)\xC0%\xC0\x0E\xC0\x04\x00
  00187+ \x9C\x00<\x00/\x00\x96\x00A\xC0\x12\xC0\x08\x00\x16\x00\x13\x00\x10
  00206+ \x00\r\xC0\r\xC0\x03\x00\n
  00214  \x00\xFF\x01\x00\x01)\x00\x00\x00\x14\x00\x12\x00\x00\x0Ffb.mysite.c
  00242+ om\x00\x0B\x00\x04\x03\x00\x01\x02\x00\n
  00254  \x00\x1C\x00\x1A\x00\x17\x00\x19\x00\x1C\x00\e\x00\x18\x00\x1A\x00\x16
  00272+ \x00\x0E\x00\r\x00\x0B\x00\x0C\x00\t\x00\n
  00284  \x00\r\x00 \x00\x1E\x06\x01\x06\x02\x06\x03\x05\x01\x05\x02\x05\x03
  00302+ \x04\x01\x04\x02\x04\x03\x03\x01\x03\x02\x03\x03\x02\x01\x02\x02\x02
  00319+ \x033t\x00\x00\x00\x10\x00\x0B\x00\t\x08http/1.1\x00\x15\x00\xAE\x00
  00344+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00361+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00378+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00395+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00412+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00429+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00446+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00463+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00480+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00497+ \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  00514+ \x00\x00\x00

It looks like the request is in hex?

My logs are looking like this at the moment:

Oct 15 20:47:36 mysiteLBNYC haproxy[24611]: 31.13.115.13:49652 [15/Oct/2018:20:47:36.390] mysite mysite/<NOSRV> -1/-1/-1/-1/0 400 187 - - PRNN 354/354/0/0/5 0/0 “<BADREQ>”
Oct 15 20:47:36 mysiteLBNYC haproxy[24611]: 31.13.127.12:58420 [15/Oct/2018:20:47:36.774] mysite mysite/<NOSRV> -1/-1/-1/-1/0 400 187 - - PRNN 356/356/0/0/5 0/0 “<BADREQ>”
Oct 15 20:47:36 mysiteLBNYC haproxy[24611]: 31.13.115.13:52454 [15/Oct/2018:20:47:36.807] mysite mysite/<NOSRV> -1/-1/-1/-1/0 400 187 - - PRNN 355/355/0/0/5 0/0 “<BADREQ>”
Oct 15 20:47:37 mysiteLBNYC haproxy[24609]: 31.13.115.13:55376 [15/Oct/2018:20:47:37.238] mysite mysite/<NOSRV> -1/-1/-1/-1/0 400 187 - - PRNN 797/797/0/0/5 0/0 “<BADREQ>”
Oct 15 20:47:37 mysiteLBNYC haproxy[24611]: 173.252.87.11:60154 [15/Oct/2018:20:47:37.556] mysite mysite/<NOSRV> -1/-1/-1/-1/0 400 187 - - PRNN 348/348/0/0/5 0/0 “<BADREQ>”
Oct 15 20:47:37 mysiteLBNYC haproxy[24609]: 173.252.87.4:58662 [15/Oct/2018:20:47:37.779] mysite mysite/<NOSRV> -1/-1/-1/-1/0 400 187 - - PRNN 803/803/1/0/5 0/0 “<BADREQ>”
Oct 15 20:47:38 mysiteLBNYC haproxy[24609]: 173.252.87.9:52904 [15/Oct/2018:20:47:38.081] mysite mysite/<NOSRV> -1/-1/-1/-1/0 400 187 - - PRNN 791/791/0/0/5 0/0 “<BADREQ>”

The site seems to be working fine in general, and I am unable to produce any errors myself.

Any help on this would be much appreciated!

Although I did not take the time to decode the hex, the fact the we see your hostname in there as well as “http/1.1” makes me think this could be TLS (SNI and ALPN).

Sounds like an issue on their side, unless you have a particular configuration, where port 443 can hit a http parser without decrypting TLS first.

Can you share your config?

1 Like

Of course!

global
    log 127.0.0.1 local0 notice
    stats socket /var/run/haproxy.stat
    maxconn 70000
    tune.maxrewrite 16384
    tune.bufsize 32768
    tune.ssl.cachesize 10000000
    user haproxy
    group haproxy
    nbproc 4
    cpu-map 1 0
    cpu-map 2 1
    cpu-map 3 2
    cpu-map 4 3

defaults
    log     global
    mode    http
    maxconn 70000
    option  httplog
    option  dontlognull
    option  forwardfor
    retries 5
    option redispatch
    timeout connect  25000
    timeout client  25000
    timeout server 25000

listen mysite
    option httplog
    option dontlog-normal
    option dontlognull
    option accept-invalid-http-request
    log /dev/log local0
    bind 0.0.0.0:80
    bind :::80 v6only
    bind *:443 ssl crt /etc/ssl/mysite.com/mysite.com.pem
    mode http
    maxconn 70000
    balance static-rr
    option http-keep-alive
    option forwardfor
    http-request set-header X-Forwarded-Proto HTTPS_ON if { ssl_fc }
    cookie SRVNAME insert
    timeout connect  10s
    timeout client  60s
    timeout server 60s
    reqidel ^X-Forwarded-For:.*

    redirect scheme https code 301 if !{ ssl_fc }

    acl fb-img-acl hdr_dom(host) -i fb.mysite.com
    use_backend varnish-backend if fb-img-acl

    acl thumb-img-acl hdr_dom(host) -i thumbs.mysite.com
    use_backend varnish-backend if thumb-img-acl

    acl letsencrypt-acl path_beg /.well-known/acme-challenge/
    use_backend letsencrypt-backend if letsencrypt-acl

    server mysite01 10.136.109.25:80 cookie MS01 check
    server mysite02 10.136.126.250:80 cookie MS02 check
    server mysite04 10.136.127.19:80 cookie MS04 check
    server mysite05 10.136.127.60:80 cookie MS05 check
    server mysite06 10.136.63.133:80 cookie MS06 check

backend letsencrypt-backend
    server letsencrypt 127.0.0.1:8888

backend varnish-backend
    server varnish 127.0.0.1:6081

Also note that the Varnish backend was only added recently and that this issue was occurring before that.

Config does not do anything exotic (like TLS in a tcp mode frontend, with some SNI dance and other complicated stuff).

Yes, the hex code is indeed from a TLS client_hello. This means that facebook is accessing port 80 via HTTPS.

Could it be that someone just posted (or continues to post) a wrong URL in facebook, where a HTTPS url points to port 80:

https://fb.mysite.com:80/blabla

Do you see valid requests from those facebook crawlers also or just these wrong request?

I’d suggest you make test, because after all, the users in facebook are triggering those crawlers:

  • try posting a normal working http link from your site, while checking the haproxy logs.
  • try posting a normal working https link from your site, while checking the haproxy logs
  • try posting a https link that erroneously points to port 80 (like in the example above)

A private message should suffice to trigger this, you don’t have to post something publicly.

You could also capture port 80 traffic from those facebook ip ranges or all port 80 traffic (after all, you redirect everything to HTTPS, so it’s probably not a high volume).

I’m pretty sure there is no issue with haproxy.

Hi Lukas,

Thank you for your help!

Yes, most of the requests seem to be valid. I have seen users sharing content from the site just fine and I was personally able to share links as well. My Varnish cache is also showing a healthy number of cache hits.

Could it be that someone just posted (or continues to post) a wrong URL in facebook, where a HTTPS url points to port 80.

I doubt that, especially considering the number of BADREQ that I am getting. I did only switch to HTTPS in June of this year and the site had been operating for a year and a half before that. i.e. All of the original shares before June 2018 would have been to the HTTP site.

The fb.mysite.com subdomain is specifically for images that have been specified in the og:image tag, which is:

The URL of the image that appears when someone shares the content to Facebook.

I used the Facebook Sharing debugger to test a non-HTTPS link and it responded saying that it had followed the 301 redirect to the HTTPS site. The scraped image also appeared just fine.

Perhaps this is an issue with the crawler accessing previously shared non-HTTPS content?

Maybe, but it must be a bug on their side. I don’t see why the old content would not follow the redirect or follow it, but maintaining the port to 80.

Either way, there doesn’t seem to be a lot you can do at this point.

Thank you for your help! I will log it as a bug in their developer support section.

Hi Lukas,

I was returning to this topic and I noticed your statement above. Is there a way for me to FORCE these port 80 requests to port 443 if it’s a particular host? i.e. fb.mysite.com?

Yes, but it requires an additional frontend/backend layer to distinguish between HTTP and HTTPS. Take a look at this thread: