Bizarre problem: HTTP~ -1 0 in HAProxy + 499 in NGINX

I have a bizarre problem, and I’m looking for a little help understanding the role that HAProxy is playing in it, if any.

Our setup is:

HAProxy → multiple web servers running NGINX/PHP-FPM → SQL/REDIS.

We have maxconn limits set in HAProxy, as well as rate limiting using the stick tables. This works really well. We haven’t had a successful DDoS attack against us in a long time.

A few days ago one of the sites we host started going down. Each time it happened, our monitoring tool sent us warnings in the exact same order:

  1. That our REDIS server was maxing out its 1GB port with data being sent to the web servers in the backend;
  2. That the web servers in the backend had run out of RAM;
  3. That our site was down

After about 30 seconds, the backend servers would recover and the site would come back.

I looked for obvious signs of an external DDoS attack and found nothing. What I did find was the same IP address—the IP address of a logged in user—in between 10-50 identical lines just before the site started to go down. At first I discounted this, because I couldn’t see how it could possibly be the culprit. But, as the same thing kept happening, I kept noticing this same IP and the same pattern.

In HAProxy (which is the only public-facing server, and handles SSL; the rest are hidden within the LAN), the relevant entries in the log look like this (I’ve sanitized it):

Mar 25 16:46:23 proxy haproxy[11692]: [<OFFENDING IP>] [25/Mar/2023:16:46:23.138] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/} GET https://domain/page/ [HTTP/2.0] backend/server

In NGINX, on the web servers behind HAProxy, a relevant entry in the log looks like this:

10.0.1.5 - - [25/Mar/2023:16:46:23 -0400] "GET /page/ HTTP/1.1" 499 0 "https://domain/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36" "<OFFENDING IP>"

(In this, 10.0.1.5 is the IP of the proxy and <OFFENDING IP> is the IP of the client out on the web.)

There are a few things I don’t understand here.

First off, HTTP~ -1 0 seems also to be what I see in the logs when HAProxy is stopping a request at the perimeter. If a client has made too many requests, for example, I see HTTP~ -1 0 in the lines after HAProxy has started blocking that client. But in this case, the requests are obviously getting through, because they’re creating equivalent 499 lines in the NGINX log. As such, my questions on this are:

  1. In what circumstances does HAProxy add a HTTP~ -1 0 line in the log? It’s a little confusing to me because the lines in HAProxy that show the problematic requests being passed through to the backend are marked HTTP~ -1 0 and the lines in HAProxy after a client has been blocked by the rate limiting are marked HTTP~ -1 0. The only difference is that the ones that are passed through are marked backend/server whereas the ones that are being blocked are marked HTTP/<NOSRV>.
  2. What should I assume these requests are? At the HAProxy level, they’re logged with HTTP~ -1 0. In NGINX, they’re logged with 499, which means that the client closed the connection. But, in this context, the “client” is actually HAProxy, right? What could explain this behavior?
  3. If it’s obvious to anyone what this is, is there a straightforward way of stopping HAProxy from passing through these sorts of requests? I could be wrong, but it almost feels as if HAProxy is passing through requests that it knows are problematic. Why would it do that? If I were seeing 200 entries in the HAProxy log but 499 entries in the NGINX log, I’d assume there was an issue between HAProxy and NGINX. But the requests seem to arrive in a bad state.

The second question I have is whether it’s “normal” for a web server to receive these sorts of bursts. Here’s what they look like on HAProxy.

Mar 25 17:05:10 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:10.201] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:10 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:10.299] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:10 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:10.309] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:10 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:10.553] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:11 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:10.815] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-b
Mar 25 17:05:11 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:11.027] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:11 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:11.233] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:11 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:11.453] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:11 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:11.680] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:12 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:11.998] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:12 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:12.197] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:12 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:12.398] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:12 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:12.777] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:13 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:13.049] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-b
Mar 25 17:05:13 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:13.246] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] ricochet.com-internal/planet-a
Mar 25 17:05:13 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:13.449] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] HTTP/<NOSRV>
Mar 25 17:05:13 proxy haproxy[13550]: [<OFFENDING IP>] [25/Mar/2023:17:05:13.710] HTTP~ -1 0 {Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36|https://domain.com/referrer} GET https://domain.com/page [HTTP/2.0] HTTP/<NOSRV>

The odd thing is that, if I look around for the other behavior from that IP—which, again, belongs to a logged in user, not some random person out on the internet—it seems pretty normal. Should I conclude that this person has some malware on their computer that is attacking whatever site is open in their browser? Or is this more likely to be the product of some misconfiguration on our site? Or both? Looking through the logs, I can see a few other logged in IPs that generate a few HTTP~ -1 0 lines in HAProxy and 499 lines in NGINX, but nothing on the scale of this particular one.

Any advice anyone could give me would be very highly appreciated.

Thank you.

Without seeing your config, I have to make some assumptions:

  1. HTTP is the name of the frontend.
  2. Your frontend is in HTTP mode.
  3. You may have modified the log format somewhat.

Nginx 499 means that the client (in this case, HAProxy) closed the connection before the server could answer the request. This hints at a timeout set too short.

HTTP~ -1 0: This one is a bit harder to decipher, especially if your log format has been modified. I assume it’s one of TR/Tw/Tc/Tr that would normally be found here. -1 has different meaning depending on which is throwing it:

  • TR: It can be "-1" if the connection was aborted before a complete request could be received or a bad request was received.
  • Tw: It can be "-1" if the connection was aborted before reaching the queue.
  • Tc: It can be "-1" if the request was aborted before a connection could be established.
  • Tr: It can be "-1" if the request was aborted before a complete response could be received.

Source for these: HAProxy Logging Formats

It’s also unusual not to see a Session state at disconnection in your logs. This is where I would expect to see something like CH-- in your scenario, which would give a lot more detail into what HAProxy is receiving from the client and from the server.

Recommendations:

  1. If you have a custom log format, comment it out and replace it with option httpslog, then compare with the Session state at disconnection part of the config manual.
  2. Check that your timeout settings are the same between HAProxy and Nginx. If HAProxy is less patient than Nginx, that will generate 499 error codes.
  3. I worked briefly on a dev team. When we would have a problem with just one user, we would watch over their shoulder or screen share while the problem was happening. It didn’t always help, but sometimes we’d see a condition that we would not have otherwise accounted for without looking. Perhaps, if possible, you could do the same with this user.

Thank you. I appreciate the time you took with this. A few responses.

Correct.

Correct.

Correct. I adapted it to feed GoAccess.

Here are my settings in HAProxy:

option abortonclose
timeout connect 2s
timeout server 300s
timeout client 3s
timeout queue 60s
timeout http-request 3s
timeout http-keep-alive 2s
timeout client-fin 1s
timeout server-fin 1s
retries 3

The timeout server variable is long in this case because, if it’s not, when we process images in Wordpress the connection gets closed.

Here is the log format in the .cfg file:

log-format "[%ci] [%tr] %ft %ST %B %hr %hs %HM %HU [%HV] %b/%s"

This was set up like this a while ago when I was playing with GoAccess. I probably need to revisit it.

I’ll do so.

Do you have a recommended set of settings for NGINX that would correspond to my HAProxy variables? The only customized setting I can find in NGINX is:

keepalive_timeout 20s;

I assume the others are default.

Thank you!

Honestly, I’m not as good with Nginx as I am with HAProxy, but I would say timeout server 300s is probably good enough… maybe adjust timeout http-keep-alive 20s to match Nginx.

What’s more important here is to understand what HAProxy did. If it just hit a timeout, you can play with those settings to lessen or even eliminate the errors. If HAProxy just responded appropriately to a weird client request, then adjusting the timeouts may not help at all.

I’ve not seen this before. Pretty cool looking. I’m intrigued and will have to try it out! :slight_smile:
If at all possible to capture the response code and termination state in this logging solution, I highly recommend it. This info has been invaluable to me in troubleshooting issues, especially when trying to determine Did the web server do it or did HAProxy do it?

If you just want to add the status code and termination state, add %ST %tsc wherever you’d like it.

  • %ST will give the status code.
  • %tsc will give the termination session state at disconnect. This also includes how HAProxy handled any persistent cookies.

If HAProxy was not interrupted or did not intervene in any way, %tsc will be ----. If you see something like CC--, you can check Session state at disconnection for details. Be sure to scroll down to the two-letter pairings and what they usually mean, like:

CC : The client aborted before the connection could be established to the server. This can happen when HAProxy tries to connect to a recently dead (or unchecked) server, and the client aborts while HAProxy is waiting for the server to respond or for "timeout connect" to expire.

Thank you. Very helpful.

FYI, to get this working with GoAccess without losing any of the information I want in HAProxy, I just used a very slightly adapted version of HAProxy’s default https logging pattern, which is:

log-format "%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r %[fc_err]/%[ssl_fc_err,hex]/%[ssl_c_err]/%[ssl_c_ca_err]/%[ssl_fc_is_resumed] %[ssl_fc_sni]/%sslv/%sslc"

And made one change, which was to remove the port (%cp) after %ci.

Then I used this pattern in GoAccess:

log-format %^  %^ %t %^ %^ %h [%d:%^] %^ %^ %^/%^/%^/%^/%D %s %b %^ %^ %^ %^ %^ {%u|%R} "%m %U %H" %^ %^/%K/%k

I’ll have a play with NGINX, and start looking through the logs for those status codes and termination states.