Haproxy 2.7.1 logging a ton of 0/400/408 errors with <BADREQ>

I have 3 frontend servers and 3 backend servers. The three frontends use haproxy, the three backends use openresty (nginx+lua).Two are with the same ISP in Europe, the third is with another ISP in North America. All use Debian 11.

The NA server is breaking. No POST requests are going through at all, they all time out with error 408 regardless of what the limit is. If it’s 5 seconds, 10 seconds, 30, 60, 120, it will timeout. The other two servers (with identical config files) do not have this issue with even a 5s timeout.

I noticed while looking for 408 errors that there were a ton of requests with as the URL. Upon examination, I am receiving dozens of these per second, but only in NA. Since all three frontends are for a multicast, errors should be common in all three. This is not the case.

The logs look like this.

Jan  8 13:59:14 ca haproxy[73212]: xxx:34464 [08/Jan/2023:13:59:14.831] https-in~ https-in/<NOSRV> -1/-1/-1/-1/0 400 0 - - CR-- 3442/3435/0/0/0 0/0 "<BADREQ>"
Jan  8 13:59:20 ca haproxy[73212]: xxx:56834 [08/Jan/2023:13:58:50.043] mati-in~ mati-in/<NOSRV> -1/-1/-1/-1/30001 408 0 - - cR-- 3498/5/0/0/0 0/0 "<BADREQ>"
Jan  8 13:59:20 ca haproxy[73212]: xxx:51904 [08/Jan/2023:13:59:20.165] https-in~ https-in/<NOSRV> -1/-1/-1/-1/10 0 0 - - PR-- 3507/3501/0/0/0 0/0 "<BADREQ>"

A well constructed 408, which reliably happens on any POST request, looks like this.

Jan  8 14:01:41 ca haproxy[73212]: xxx:43748 [08/Jan/2023:14:01:11.721] https-in~ https-in/<NOSRV> -1/-1/-1/-1/30001 408 304 - - cD-- 3703/3692/0/0/0 0/0 "POST https://foo.bar/threads/test.129102/add-reply HTTP/2.0"

I have tried:

  • Ensuring TCP is permitted for posts 443 and 80.
  • Increasing and decreasing timeouts.
  • Adding and removing http-keep-alive and http-buffer-request.
  • errorfile 408 /dev/null
  • Changing SSL ciphers.
  • Enabling and disabling HTTP2.

The error is not browser dependent. I have tested Chromium forks and FireFox. It happens on mobile devices and desktop. It happens between ISPs. I have no clue what the issue is and I am running the latest stable version of haproxy.

After setting my ethernet device’s MTU to 1400 down from 1500, it works. This was done at suggestion of my ISP and to be quite honest I have no idea why it works.

I see some mentions of MTU’s in the config documentation, but I would not have thought of MTU’s causing that issue. Perhaps the section on Timing Events might make some sense with your solution? Honestly, I thought 1500 was pretty universal outside of advanced networking stuff like BGP and some VPN solutions.

It is, and haproxy does not break on my other two servers with a 1500 MTU. I’m wondering what the third’s ISP is doing that creates an issue with MTUs. I think he has some sort of network filtering that might be adding data to the packet? I’m not sure, I’m not very familiar with low level networking.

I am also doing BGP on this server, by the way. haproxy is directing traffic for an entire /24 to multiple backends. However, so are the other two servers! It’s a very strange thing.

I’m not familiar with that level of networking either. I’ve seen VPN’s have different values for MTU’s. I’ve also seen BGP use values for MTU’s like 9200.

HAProxy has a setting for Maximum Segment Size that you could play with on the bind line if you’re feeling experimental. Even the documentation mentions “that this relies on a kernel feature which is theoretically supported under Linux but was buggy in all versions prior to 2.6.28”, so proceed with caution.