Haproxy 1.7.11 intermittently closes connection/sends empty response on POST requests

Hello community,
I’m seeing a very small number (0.0025%, ~20-30 req out of 800-1000K daily) of POST requests just being shut down without any response from haproxy:

  • The HTTP client has a pool of persistent connections (apache http client 4.5.5)
  • HTTP Client is able to send the whole POST request to haproxy
  • When HTTP client tries to read the response, haproxy seems to send empty response (or no response at all, this results in an org.apache.http.NoHttpResponseException).
  • While checking the access logs of backends I can see, that the failed requests never hit the backends.

I wonder what could be the possible hiccup here, initially I thought that configuring haproxy with smaller timeout http-keep-alive would help to avoid a possible (imaginary) race condition, when the backend closes the connection right when haproxy starts forwarding it, but seems like it didn’t help.

Setup:

  • 2 backends running Apache Tomcat 8.0.33, (connectionTimeout = 20 sec, keepAliveTimeout = 20 sec)

The config:

backend somebackend
		description http:8080
		balance roundrobin
		option httpchk GET /status
		http-check expect status 200
		default-server inter 250 fall 3 weight 100
		timeout server 66000
		timeout http-keep-alive 15s
		server service1 {address1} check
		server service2 {address2} check

defaults
		log     global
		mode    http
		option  httplog
		option  dontlognull
		option  dontlog-normal
		timeout connect 5000
		timeout client  50000
		timeout server  50000
		timeout client-fin 600s
		timeout tunnel  1h

frontend main
		bind *:8080-8081
		option forwardfor
		maxconn 6000
		use_backend somebackend if { hdr_beg(host) -i old-name.local }
		use_backend somebackend if { hdr_beg(host) -i current-name.local }
		use_backend somebackend if { hdr_beg(host) -i a-very-old-name.local }

Edit: There’s a reason why I’ve specified that POST requests are being treated like that. Looking at the haproxy source code I’ve noticed some code that allows interop with older browsers:

/* POST requests may require to read extra CRLF sent by broken
	 * browsers and which could cause an RST to be sent upon close
	 * on some systems (eg: Linux). */
	channel_auto_read(req);

Similar POST request tweaks are located in a few places as well, not sure if this could potentially affect things. I’ll also have to note that most of the HTTP requests sent to backends via haproxy are POST requests. Could there be a rare race condition when two consequent POST requests could be read, but only first one handled?

What does haproxy log?

There is nothing logged in haproxy log during the time when it happens and it makes me think that the problem is somewhere in the HTTP client, but the client was able to send the POST request which is weird.

Haproxy should always log. It may be that due to a haproxy bug the log that is emitted is incorrect, but not emitting a log in the first place would be a first.

I assume you cannot reproduce this problem?

Honestly, at this point I don’t think there can be done much, other than capturing the entire haproxy frontend traffic checking the actual tcp sessions, when this happens.

Hey @lukastribus! Thanks for your help, we’ve managed to figure out some additional stuff: There’s a client, that does small http POST requests every 60 seconds, and client|server timeouts are also set to 60 sec. So we came up with theory, that frontend/backend connection might be closing at the moment of forwarding client request to backend resulting in the above-described behavior. To test this theory, we simply bumped up client|server timeouts to 70 sec, and errors are gone now (there used to be around 6-10 errors per 24 hours, 0 errors for straight 3 days of testing new config).

I wonder if there’s some kind of race condition in haproxy? In theory, once http request is sent successfully from client, haproxy should be aware of it and not end up closing the connection.

There is no solution to this, this race condition is part of HTTP.

A http client can retry idempotent methods, but for POST requests the upper application layer needs to handle this case (decide whether it safe to retry the same requests when a network error occurred).

Also see:

You should definitely use timeout http-keep-alive to overcome this, as opposed to leave it up to client/server timeouts (so the keep-alive timeout is not bound to client/server timeouts).

This is pure gold, thanks a lot!