More oddities, looking for guidance

These may be bugs, but I’m more inclined to believe they are just some kind of configuration assumptions we’re making poorly.

We’ve had sporadic reports from customers using our file-sharing service (think large and small files ala Dr*pbox) that downloads get “corrupted” – more specifically, the last few bytes or blocks don’t get transferred. This occurred when we had HTTP/2.0 enabled on HAPROXY. Direct to the Apache server backends, this problem doesn’t present. Ok. Disable HTTP/2.0. move on.

Next, we get reports that there is some kind of (new) upload challenge with HTTP/1.1 running on HAPROXY.We haven’t changed our config other than disabling 2.0. Remove HAPROXY (go direct to Apache), problem disappears.

So now I’m wondering if we have our timeouts or buffers or something set strangely either on the haproxy server(s) themselves. We are actively working on creating reproducibility of the issue, so this is just a shot in the dark before that.

Anything from a few Kbytes to 10+GB of transfers in either direction for several hundreds of users per server. Latencies from 2ms up to 300ms or more depending on where in the world they are (or that plus cell phones or satellite).

In some cases people are transferring files over 100G, but that could be hitting the 12 hour window.

Here is the config:

 global
        maxconn         100000
        log /dev/log    local0
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 777 level admin expose-fd listeners
        stats timeout 30s
        user haproxy
        group haproxy
        daemon
        nbproc 1
        nbthread 8
        hard-stop-after 12h


        tune.bufsize 65536



        tune.h2.initial-window-size 4096000

defaults
        log     global
        mode    http
        option forwardfor
        option redispatch
        option log-separate-errors

        timeout client 12h
        timeout server 12h
        timeout connect 20s

        log-format "%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r %sslc"

        errorfile 400 /etc/haproxy/errors/400.http
        errorfile 403 /etc/haproxy/errors/403.http
        errorfile 408 /etc/haproxy/errors/408.http
        errorfile 500 /etc/haproxy/errors/500.http
        errorfile 502 /etc/haproxy/errors/502.http
        errorfile 503 /etc/haproxy/errors/503.http
        errorfile 504 /etc/haproxy/errors/504.http

listen statsend
        bind 127.0.0.1:9000
        mode http
        stats enable
        stats hide-version
        stats scope .
        stats realm Haproxy\ Statistics
        stats uri /haproxy-stats?stats


cache objects
        total-max-size 1024
        max-object-size 2560000
        max-age 86400

frontend cp.xxx.com
     bind *:80  tcp-ut 300s
     bind :::80  tcp-ut 300s


     bind *:443 ssl crt /etc/apache2/sites-available/xxx.com/ssl/le/ssl-certs.pem ssl-min-ver TLSv1.2  alpn http/1.1  tcp-ut 300s
     bind :::443 ssl crt /etc/apache2/sites-available/xxx.com/ssl/le/ssl-certs.pem ssl-min-ver TLSv1.2  alpn http/1.1  tcp-ut 300s

     maxconn 50000
     compression algo gzip
     compression type text/html text/plain text/javascript application/javascript application/xml text/css
     option forwardfor
     option http-keep-alive


     timeout client     8h
     timeout http-keep-alive 60s
     timeout http-request 60s
     timeout client-fin 60s
     http-request cache-use objects
     http-response cache-store objects
     http-request set-header X-Forwarded-Port %[dst_port]
     http-request add-header X-Forwarded-Proto https if { ssl_fc }
     capture request header Referrer len 64
     capture request header Content-Length len 10
     capture request header User-Agent len 64
     http-request add-header  Strict-Transport-Security  max-age=15768000
     http-request redirect scheme https unless { ssl_fc }
     default_backend nodes


backend nodes
    mode http
    hash-type consistent
    option redispatch
    fullconn 40000




    timeout check 3000
    option httpchk GET /index.php
    http-check expect status 200




    retry-on empty-response conn-failure

    option log-health-checks

    balance leastconn

    cookie WSRVID insert indirect nocache maxidle 30m maxlife 24h


    server www3.xxx.com 127.0.0.5:443 check check-ssl ssl force-tlsv13 verify none sni ssl_fc_sni allow-0rtt alpn http/1.1 cookie s3 maxconn 10000 check-alpn http/1.1 tcp-ut 300s
    server www4.xxx.com 127.0.0.6:443 check check-ssl ssl force-tlsv13 verify none sni ssl_fc_sni allow-0rtt alpn http/1.1 cookie s4 maxconn 10000 check-alpn http/1.1 tcp-ut 300s
    server www5.xxx.com 127.0.0.7:443 check check-ssl ssl force-tlsv13 verify none sni ssl_fc_sni allow-0rtt alpn http/1.1 cookie s5 maxconn 10000 check-alpn http/1.1 tcp-ut 300s
    server www6.xxx.com 127.0.0.8:443 check check-ssl ssl force-tlsv13 verify none sni ssl_fc_sni allow-0rtt alpn http/1.1 cookie s6 maxconn 10000 check-alpn http/1.1 tcp-ut 300s

Thanks in advance!