Random SA-- errors with Haproxy 1.8.3

Hello,

I have a strange errors occurring only on one specific HTTP endpoint and only with some backends.

I have a Haproxy in front of 4 nginx in front of one python application. Two nginx backends are using https (and are in docker containers, but that should not be relevent ^^) and other two are using http (and not in docker containers).

Configuration :

global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon

# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private

ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3

defaults
log global
mode http
option httplog
option dontlognull
option dontlog-normal
option http-ignore-probes
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http

backend pc-backend
balance roundrobin

server -1 185.:5000 check ssl verify none weight 25
server -2 34.:5000 check ssl verify none weight 25
server -3 34.:80 check weight 25
server -4 54.:80 check weight 25

frontend http
bind *:80
mode http
option forwardfor
default_backend pc-backend

frontend https
bind *:443 ssl crt /etc/apache2/ssl/.key.pem
mode http
option forwardfor

default_backend pc-backend

listen stats
bind *:8999 ssl crt /etc/apache2/ssl/.key.pem
mode http
stats enable
stats realm Haproxy\ Statistics
stats uri /haproxy_stats

Some specific queries (on /v2/batch) are sometime (not always) failing with the following logs, only on server -1 and -2 (with https). (Image #1)

On nginx side, requests are fine and there are no errors. (Image #2)

I didn’t manage to reproduce those queries (it’s in production and there is too many queries / seconds), but I was able to capture a working and failing transaction on the haproxy machine and on the backend:

I put all images here since I’m limited as a new user: https://imgur.com/a/JyQsi

The only difference seems to be the [FIN, ACK] packet from the backend arriving after (working) or before (error) the [RST, ACK] sent from HAProxy, on HAProxy side.

I don’t really understand why queries are failling, this is the only difference I see. Do someone have any idea ?

Thanks!

Are you seeing errors in the clients as well, or does the client see the correct message and “only” haproxy logs show this error?

Can you share haproxy and nginx logs of working requests/responses?

Hi,

I assume this is a SSL related bug because it happens only on servers -1 & -2. Could you try the following patch, sent by Olivier on the mailing-list ?

https://www.mail-archive.com/haproxy@formilux.org/msg28970.html

The following patch was just committed:
http://git.haproxy.org/?p=haproxy.git;a=patch;h=7e2e505006feb8f3b4a7f9e0ac5e89b5a8c4895e

Hello,

I applied the patch on 1.8.4 and indeed the problem is gone :slight_smile:

Thanks!

Hello again,

There seem to be another issue with the patch however, when it’s applied the haproxy process quickly goes to 100% CPU usage after a few seconds.

1.8.4 without the patch doesn’t have the issue.

Hi,

I’m able to reproduce the bug. Could you try the following patch please ?

https://pastebin.com/raw/edsacQPb

Hello,

Seems to be ok with your patch :slight_smile:

Thanks, looking forward to have this officially in 1.8.X now :stuck_out_tongue:

Thanks for your confirmation. It has been merged in upstream. It will be pushed in 1.8 very soon.