Haproxy And Percona Cluster Transaction Commit Failing

Hi,

I have an issue with my loadbalanced percona setup. The issue only rears its head when there is a large amount of data being inserted into the database during a transaction, it fails at the commit stage, it seems as if the connection times out but it happens no matter what I raise the timeouts to. There is no issue when I hit one of the percona nodes directly, and I have set the haproxy config to write to 1 node only. As it works when I hit the percona node directly it seems the issue is with my haproxy configuration. Here it is, any help greatly appreciated:

global
        log /dev/log    local0
        log /dev/log    local1 notice
        maxconn 4096
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private

        # Default ciphers to use on SSL-enabled listening sockets.
        # For more information, see ciphers(1SSL). This list is from:
        #  https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
        # An alternative list with additional directives can be obtained from
        #  https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy
        ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
        ssl-default-bind-options no-sslv3

defaults
        log     global
        mode    tcp
        option  tcplog
        option  dontlognull
        option  tcp-smart-accept
        option  tcp-smart-connect
        retries 3
        option  redispatch
        maxconn 40000
        timeout check   3500ms
        timeout queue   3500ms
        timeout connect 3500ms
        timeout client  10000ms
        timeout server  10000ms

listen stats
        bind 10.100.2.11:9000
        mode http
        stats enable
        stats hide-version
        stats realm HAproxy-Statistics
        stats uri /haproxy_stats
        stats auth 

listen percona_cluster_write
        bind 10.100.2.11:3306
        mode tcp
        timeout client 1m
        timeout server 1m
        balance leastconn
        option mysql-check
        server DB01-QA 10.100.3.11:3306 send-proxy check inter 12000 rise 3 fall 3
        server DB01-QA 10.100.3.12:3306 send-proxy check inter 12000 rise 3 fall 3 backup
        server DB01-QA 10.100.3.13:3306 send-proxy check inter 12000 rise 3 fall 3 backup

listen percona_cluster_read
        bind 10.100.2.11:3307
        mode tcp
        balance leastconn
        option mysql-check
        server DB01-QA 10.100.3.11:3306 send-proxy check inter 12000 rise 3 fall 3
        server DB02-QA 10.100.3.12:3306 send-proxy check inter 12000 rise 3 fall 3
        server DB03-QA 10.100.3.13:3306 send-proxy check inter 12000 rise 3 fall 3

Here is error I get from application:

Fatal error encountered during command execution.
MySql.Data.MySqlClient.MySqlException (0x80004005): Fatal error encountered during command execution. —> MySql.Data.MySqlClient.MySqlException (0x80004005): Fatal error encountered attempting to read the resultset. —> MySql.Data.MySqlClient.MySqlException (0x80004005): Reading from the stream has failed. —> System.IO.EndOfStreamException: Attempted to read past the end of the stream.
at MySql.Data.MySqlClient.MySqlStream.ReadFully(Stream stream, Byte[] buffer, Int32 offset, Int32 count)
at MySql.Data.MySqlClient.MySqlStream.LoadPacket()
at MySql.Data.MySqlClient.MySqlStream.LoadPacket()
at MySql.Data.MySqlClient.MySqlStream.ReadPacket()
at MySql.Data.MySqlClient.NativeDriver.GetResult(Int32& affectedRow, Int64& insertedId)
at MySql.Data.MySqlClient.Driver.NextResult(Int32 statementId, Boolean force)
at MySql.Data.MySqlClient.MySqlDataReader.NextResult()
at MySql.Data.MySqlClient.MySqlDataReader.NextResult()
at MySql.Data.MySqlClient.MySqlCommand.ExecuteReader(CommandBehavior behavior)
at MySql.Data.MySqlClient.MySqlCommand.ExecuteReader(CommandBehavior behavior)
at MySql.Data.MySqlClient.MySqlCommand.ExecuteNonQuery()
at MySql.Data.MySqlClient.MySqlTransaction.Commit()

Provide the haproxy log, it will tell what happened to that connection.

Sep 4 09:54:08 HAP01-QA haproxy[23936]: 10.100.6.11:65309 [04/Sep/2018:09:43:56.160] percona_cluster_write percona_cluster_write/DB01-QA 1/0/612114 752958695 cD 3/3/2/2/0 0/0
Sep 4 09:56:09 HAP01-QA haproxy[23936]: 10.100.6.11:65308 [04/Sep/2018:09:43:56.052] percona_cluster_write percona_cluster_write/DB01-QA 1/0/733626 22806937 cD 3/3/2/2/0 0/0
Sep 4 09:56:09 HAP01-QA haproxy[23936]: 10.100.6.11:65395 [04/Sep/2018:09:55:09.683] percona_cluster_write percona_cluster_write/DB01-QA 1/0/60125 33913 cD 2/2/1/1/0 0/0

Termination flag is cD:

cD   The client did not send nor acknowledge any data for as long as the
     "timeout client" delay. This is often caused by network failures on
     the client side, or the client simply leaving the net uncleanly.

The sessions lasted 612, 733 and 60 seconds respectively.

How much time does the insert + commit need exactly? It certainly must be less than the timeouts.

I’m not entirely sure of how long it takes, it varies depending on size of data, smaller data sets import fine. I am running PMM to monitor the cluster, this is graph showing what was going on during the transaction, it bombs out at the right hand side of the graph.

As I said, this works fine bypassing haproxy, so not sure why the client seemingky times out when using haproxy.

Well, the behavior of haproxy is straigthforward: if the client doens’t talks for timeout client time, then haproxy will close the session (same with the server, based on timeout server)

The better you understand your applications behavior, the easier it will be for your to choose the correct timeout.