Understanding maxconn and maxonnrate and delays

Hello, I am troubleshooting an issue most likely related with high load per second.

While investigating I am looking on network capture and I see the delay ~50 sec between client`s SYN sent to haproxy, haproxy sent ACK back to the client but from the haproxy to the server there is delay ~50sec. For me it seems like haproxy process get request ~50sec after it is arrived to VM.

CPU is 100% and swap memory is being utilized which is not good I believe. There a lot about 100K CLOSE_WAIT connections. Also there a lot from FIN_WAIT2 connections

My theory is once many clients hit haproxy at once it starts to put connections to backlog and while processing connections new arrives, those which are processing already has delay so later they doesn’t make sens for the client and client retries with new connections resulting in more connections in backlog…

Is there any way to protect haproxy or linux from being overloaded by dropping connections, is it a good practice?

haproxy.cfg

global
log /dev/log local0
log /dev/log local1 debug
maxconnrate 280
maxsessrate 280
maxconn 100000
daemon
user haproxy
group haproxy
stats socket /var/run/haproxy.sock level admin
defaults
mode tcp
log global
option tcplog
option dontlognull
timeout connect 5s
timeout client 24h
timeout server 60m
maxconn 100000

frontend service_name
bind 50.1.1.3:1234
acl p1234 dst_port 1234
use_backend service_name_1234 if p1234

backend service_name_1234
balance leastconn
option independant-streams
server server_vir1 x1:1234 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 172.1.2.3
server server_vir2 x2:1234 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 172.1.2.4
server server_vir3 x3:1234 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 172.1.2.5

sysctl config

abi.vsyscall32 = 1
crypto.fips_enabled = 1
debug.exception-trace = 1
debug.kprobes-optimization = 1
debug.panic_on_rcu_stall = 0
dev.hpet.max-user-freq = 64
fs.aio-max-nr = 65536
fs.aio-nr = 0
fs.nr_open = 1048576
fs.overflowgid = 65534
fs.overflowuid = 65534
fs.pipe-max-size = 1048576
fs.pipe-user-pages-hard = 0
fs.pipe-user-pages-soft = 16384
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
fs.quota.allocated_dquots = 0
fs.quota.cache_hits = 0
fs.quota.drops = 0
fs.quota.free_dquots = 0
fs.quota.lookups = 0
fs.quota.reads = 0
fs.quota.syncs = 4
fs.quota.warnings = 1
fs.quota.writes = 0
fs.suid_dumpable = 2
kernel.random.entropy_avail = 3472
kernel.random.poolsize = 4096
kernel.random.read_wakeup_threshold = 64
kernel.random.urandom_min_reseed_secs = 60
kernel.random.write_wakeup_threshold = 896
kernel.randomize_va_space = 2
kernel.real-root-dev = 0
kernel.sched_autogroup_enabled = 0
kernel.sched_cfs_bandwidth_slice_us = 5000
kernel.sched_child_runs_first = 0
kernel.sched_domain.cpu0.domain0.busy_factor = 32
kernel.sched_domain.cpu0.domain0.busy_idx = 2
kernel.sched_domain.cpu0.domain0.cache_nice_tries = 1
kernel.sched_domain.cpu0.domain0.flags = 4143
kernel.sched_domain.cpu0.domain0.forkexec_idx = 0
kernel.sched_domain.cpu0.domain0.idle_idx = 1
kernel.sched_domain.cpu0.domain0.imbalance_pct = 125
kernel.sched_domain.cpu0.domain0.max_interval = 4
kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost = 17558
kernel.sched_domain.cpu0.domain0.min_interval = 2
kernel.sched_domain.cpu0.domain0.name = DIE
kernel.sched_domain.cpu0.domain0.newidle_idx = 0
kernel.sched_domain.cpu0.domain0.wake_idx = 0
kernel.sched_domain.cpu1.domain0.busy_factor = 32
kernel.sched_domain.cpu1.domain0.busy_idx = 2
kernel.sched_domain.cpu1.domain0.cache_nice_tries = 1
kernel.sched_domain.cpu1.domain0.flags = 4143
kernel.sched_domain.cpu1.domain0.forkexec_idx = 0
kernel.sched_domain.cpu1.domain0.idle_idx = 1
kernel.sched_domain.cpu1.domain0.imbalance_pct = 125
kernel.sched_domain.cpu1.domain0.max_interval = 4
kernel.sched_domain.cpu1.domain0.max_newidle_lb_cost = 9445
kernel.sched_domain.cpu1.domain0.min_interval = 2
kernel.sched_domain.cpu1.domain0.name = DIE
kernel.sched_domain.cpu1.domain0.newidle_idx = 0
kernel.sched_domain.cpu1.domain0.wake_idx = 0
kernel.sched_latency_ns = 12000000
kernel.sched_migration_cost_ns = 500000
kernel.sched_min_granularity_ns = 1500000
kernel.sched_nr_migrate = 32
kernel.sched_rr_timeslice_ms = 100
kernel.sched_rt_period_us = 1000000
kernel.sched_rt_runtime_us = 950000
kernel.sched_schedstats = 0
kernel.sched_shares_window_ns = 10000000
kernel.sched_time_avg_ms = 1000
kernel.sched_tunable_scaling = 1
kernel.sched_wakeup_granularity_ns = 2000000
kernel.sem = 250 32000 32 128
kernel.sem_next_id = -1
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096
kernel.softlockup_all_cpu_backtrace = 0
kernel.softlockup_panic = 0
kernel.stack_tracer_enabled = 0
kernel.sysrq = 16
kernel.tainted = 0
kernel.threads-max = 62405
kernel.timer_migration = 1
kernel.traceoff_on_warning = 0
kernel.unknown_nmi_panic = 1
kernel.usermodehelper.bset = 4294967295 31
kernel.usermodehelper.inheritable = 4294967295 31
kernel.version = #1 SMP Fri Oct 13 10:46:25 EDT 2017
kernel.watchdog = 1
kernel.watchdog_cpumask = 0-1
kernel.watchdog_thresh = 10
kernel.yama.ptrace_scope = 0
net.core.bpf_jit_enable = 0
net.core.busy_poll = 0
net.core.busy_read = 0
net.core.default_qdisc = pfifo_fast
net.core.dev_weight = 64
net.core.message_burst = 10
net.core.message_cost = 5
net.core.netdev_budget = 300
net.core.netdev_max_backlog = 1000
net.core.netdev_rss_key = 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
net.core.netdev_tstamp_prequeue = 1
net.core.optmem_max = 20480
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.core.rps_sock_flow_entries = 0
net.core.somaxconn = 1024
net.core.warnings = 1
net.core.wmem_default = 212992
net.core.wmem_max = 212992
net.core.xfrm_acq_expires = 30
net.core.xfrm_aevent_etime = 10
net.core.xfrm_aevent_rseqth = 2
net.core.xfrm_larval_drop = 1
net.ipv4.cipso_cache_bucket_size = 10
net.ipv4.cipso_cache_enable = 1
net.ipv4.cipso_rbm_optfmt = 0
net.ipv4.cipso_rbm_strictvalid = 1
net.ipv4.conf.all.accept_local = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.arp_accept = 0
net.ipv4.conf.all.arp_announce = 0
net.ipv4.conf.all.arp_filter = 0
net.ipv4.conf.all.arp_ignore = 0
net.ipv4.conf.all.arp_notify = 0
net.ipv4.conf.all.bootp_relay = 0
net.ipv4.conf.all.disable_policy = 0
net.ipv4.conf.all.disable_xfrm = 0
net.ipv4.conf.all.force_igmp_version = 2
net.ipv4.conf.all.forwarding = 1
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.all.mc_forwarding = 0
net.ipv4.conf.all.medium_id = 0
net.ipv4.conf.all.promote_secondaries = 1
net.ipv4.conf.all.proxy_arp = 0
net.ipv4.conf.all.proxy_arp_pvlan = 0
net.ipv4.conf.all.route_localnet = 0
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.shared_media = 1
net.ipv4.conf.all.src_valid_mark = 0
net.ipv4.conf.all.tag = 0
net.ipv4.conf.default.accept_local = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.default.arp_accept = 0
net.ipv4.conf.default.arp_announce = 0
net.ipv4.conf.default.arp_filter = 0
net.ipv4.conf.default.arp_ignore = 0
net.ipv4.conf.default.arp_notify = 0
net.ipv4.conf.default.bootp_relay = 0
net.ipv4.conf.default.disable_policy = 0
net.ipv4.conf.default.disable_xfrm = 0
net.ipv4.conf.default.force_igmp_version = 2
net.ipv4.conf.default.forwarding = 1
net.ipv4.conf.default.log_martians = 0
net.ipv4.conf.default.mc_forwarding = 0
net.ipv4.conf.default.medium_id = 0
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.default.proxy_arp = 0
net.ipv4.conf.default.proxy_arp_pvlan = 0
net.ipv4.conf.default.route_localnet = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.default.shared_media = 1
net.ipv4.conf.default.src_valid_mark = 0
net.ipv4.conf.default.tag = 0
net.ipv4.conf.eth0.accept_local = 0
net.ipv4.conf.eth0.accept_redirects = 0
net.ipv4.conf.eth0.accept_source_route = 0
net.ipv4.conf.eth0.arp_accept = 0
net.ipv4.conf.eth0.arp_announce = 0
net.ipv4.conf.eth0.arp_filter = 0
net.ipv4.conf.eth0.arp_ignore = 0
net.ipv4.conf.eth0.arp_notify = 0
net.ipv4.conf.eth0.bootp_relay = 0
net.ipv4.conf.eth0.disable_policy = 0
net.ipv4.conf.eth0.disable_xfrm = 0
net.ipv4.conf.eth0.force_igmp_version = 0
net.ipv4.conf.eth0.forwarding = 1
net.ipv4.conf.eth0.log_martians = 0
net.ipv4.conf.eth0.mc_forwarding = 0
net.ipv4.conf.eth0.medium_id = 0
net.ipv4.conf.eth0.promote_secondaries = 1
net.ipv4.conf.eth0.proxy_arp = 0
net.ipv4.conf.eth0.proxy_arp_pvlan = 0
net.ipv4.conf.eth0.route_localnet = 0
net.ipv4.conf.eth0.rp_filter = 1
net.ipv4.conf.eth0.secure_redirects = 0
net.ipv4.conf.eth0.send_redirects = 0
net.ipv4.conf.eth0.shared_media = 1
net.ipv4.conf.eth0.src_valid_mark = 0
net.ipv4.conf.eth0.tag = 0
net.ipv4.conf.eth2.accept_local = 0
net.ipv4.conf.eth2.accept_redirects = 0
net.ipv4.conf.eth2.accept_source_route = 0
net.ipv4.conf.eth2.arp_accept = 0
net.ipv4.conf.eth2.arp_announce = 0
net.ipv4.conf.eth2.arp_filter = 0
net.ipv4.conf.eth2.arp_ignore = 0
net.ipv4.conf.eth2.arp_notify = 0
net.ipv4.conf.eth2.bootp_relay = 0
net.ipv4.conf.eth2.disable_policy = 0
net.ipv4.conf.eth2.disable_xfrm = 0
net.ipv4.conf.eth2.force_igmp_version = 0
net.ipv4.conf.eth2.forwarding = 1
net.ipv4.conf.eth2.log_martians = 0
net.ipv4.conf.eth2.mc_forwarding = 0
net.ipv4.conf.eth2.medium_id = 0
net.ipv4.conf.eth2.promote_secondaries = 1
net.ipv4.conf.eth2.proxy_arp = 0
net.ipv4.conf.eth2.proxy_arp_pvlan = 0
net.ipv4.conf.eth2.route_localnet = 0
net.ipv4.conf.eth2.rp_filter = 1
net.ipv4.conf.eth2.secure_redirects = 0
net.ipv4.conf.eth2.send_redirects = 0
net.ipv4.conf.eth2.shared_media = 1
net.ipv4.conf.eth2.src_valid_mark = 0
net.ipv4.conf.eth2.tag = 0
net.ipv4.conf.lo.accept_local = 0
net.ipv4.conf.lo.accept_redirects = 1
net.ipv4.conf.lo.accept_source_route = 1
net.ipv4.conf.lo.arp_accept = 0
net.ipv4.conf.lo.arp_announce = 0
net.ipv4.conf.lo.arp_filter = 0
net.ipv4.conf.lo.arp_ignore = 0
net.ipv4.conf.lo.arp_notify = 0
net.ipv4.conf.lo.bootp_relay = 0
net.ipv4.conf.lo.disable_policy = 1
net.ipv4.conf.lo.disable_xfrm = 1
net.ipv4.conf.lo.force_igmp_version = 0
net.ipv4.conf.lo.forwarding = 1
net.ipv4.conf.lo.log_martians = 0
net.ipv4.conf.lo.mc_forwarding = 0
net.ipv4.conf.lo.medium_id = 0
net.ipv4.conf.lo.promote_secondaries = 0
net.ipv4.conf.lo.proxy_arp = 0
net.ipv4.conf.lo.proxy_arp_pvlan = 0
net.ipv4.conf.lo.route_localnet = 0
net.ipv4.conf.lo.rp_filter = 0
net.ipv4.conf.lo.secure_redirects = 1
net.ipv4.conf.lo.send_redirects = 1
net.ipv4.conf.lo.shared_media = 1
net.ipv4.conf.lo.src_valid_mark = 0
net.ipv4.conf.lo.tag = 0
net.ipv4.icmp_echo_ignore_all = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_errors_use_inbound_ifaddr = 0
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.icmp_msgs_burst = 50
net.ipv4.icmp_msgs_per_sec = 1000
net.ipv4.icmp_ratelimit = 1000
net.ipv4.icmp_ratemask = 6168
net.ipv4.igmp_max_memberships = 20
net.ipv4.igmp_max_msf = 10
net.ipv4.igmp_qrv = 2
net.ipv4.inet_peer_maxttl = 600
net.ipv4.inet_peer_minttl = 120
net.ipv4.inet_peer_threshold = 65664
net.ipv4.ip_default_ttl = 64
net.ipv4.ip_dynaddr = 0
net.ipv4.ip_early_demux = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_use_pmtu = 0
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.ip_local_reserved_ports =
net.ipv4.ip_no_pmtu_disc = 0
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ipfrag_high_thresh = 4194304
net.ipv4.ipfrag_low_thresh = 3145728
net.ipv4.ipfrag_max_dist = 64
net.ipv4.ipfrag_secret_interval = 600
net.ipv4.ipfrag_time = 30
net.ipv4.neigh.default.anycast_delay = 100
net.ipv4.neigh.default.app_solicit = 0
net.ipv4.neigh.default.base_reachable_time_ms = 30000
net.ipv4.neigh.default.delay_first_probe_time = 5
net.ipv4.neigh.default.gc_interval = 30
net.ipv4.neigh.default.gc_stale_time = 60
net.ipv4.neigh.default.gc_thresh1 = 128
net.ipv4.neigh.default.gc_thresh2 = 512
net.ipv4.neigh.default.gc_thresh3 = 1024
net.ipv4.neigh.default.locktime = 100
net.ipv4.neigh.default.mcast_solicit = 3
net.ipv4.neigh.default.proxy_delay = 80
net.ipv4.neigh.default.proxy_qlen = 64
net.ipv4.neigh.default.retrans_time_ms = 1000
net.ipv4.neigh.default.ucast_solicit = 3
net.ipv4.neigh.default.unres_qlen = 31
net.ipv4.neigh.default.unres_qlen_bytes = 65536
net.ipv4.neigh.eth0.anycast_delay = 100
net.ipv4.neigh.eth0.app_solicit = 0
net.ipv4.neigh.eth0.base_reachable_time_ms = 30000
net.ipv4.neigh.eth0.delay_first_probe_time = 5
net.ipv4.neigh.eth0.gc_stale_time = 60
net.ipv4.neigh.eth0.locktime = 100
net.ipv4.neigh.eth0.mcast_solicit = 3
net.ipv4.neigh.eth0.proxy_delay = 80
net.ipv4.neigh.eth0.proxy_qlen = 64
net.ipv4.neigh.eth0.retrans_time_ms = 1000
net.ipv4.neigh.eth0.ucast_solicit = 3
net.ipv4.neigh.eth0.unres_qlen = 31
net.ipv4.neigh.eth0.unres_qlen_bytes = 65536
net.ipv4.neigh.eth1.anycast_delay = 100
net.ipv4.neigh.eth1.app_solicit = 0
net.ipv4.neigh.eth1.base_reachable_time_ms = 30000
net.ipv4.neigh.eth1.delay_first_probe_time = 5
net.ipv4.neigh.eth1.gc_stale_time = 60
net.ipv4.neigh.eth1.locktime = 100
net.ipv4.neigh.eth1.mcast_solicit = 3
net.ipv4.neigh.eth1.proxy_delay = 80
net.ipv4.neigh.eth1.proxy_qlen = 64
net.ipv4.neigh.eth1.retrans_time_ms = 1000
net.ipv4.neigh.eth1.ucast_solicit = 3
net.ipv4.neigh.eth1.unres_qlen = 31
net.ipv4.neigh.eth1.unres_qlen_bytes = 65536
net.ipv4.neigh.eth2.anycast_delay = 100
net.ipv4.neigh.eth2.app_solicit = 0
net.ipv4.neigh.eth2.base_reachable_time_ms = 30000
net.ipv4.neigh.eth2.delay_first_probe_time = 5
net.ipv4.neigh.eth2.gc_stale_time = 60
net.ipv4.neigh.eth2.locktime = 100
net.ipv4.neigh.eth2.mcast_solicit = 3
net.ipv4.neigh.eth2.proxy_delay = 80
net.ipv4.neigh.eth2.proxy_qlen = 64
net.ipv4.neigh.eth2.retrans_time_ms = 1000
net.ipv4.neigh.eth2.ucast_solicit = 3
net.ipv4.neigh.eth2.unres_qlen = 31
net.ipv4.neigh.eth2.unres_qlen_bytes = 65536
net.ipv4.neigh.lo.anycast_delay = 100
net.ipv4.neigh.lo.app_solicit = 0
net.ipv4.neigh.lo.base_reachable_time_ms = 30000
net.ipv4.neigh.lo.delay_first_probe_time = 5
net.ipv4.neigh.lo.gc_stale_time = 60
net.ipv4.neigh.lo.locktime = 100
net.ipv4.neigh.lo.mcast_solicit = 3
net.ipv4.neigh.lo.proxy_delay = 80
net.ipv4.neigh.lo.proxy_qlen = 64
net.ipv4.neigh.lo.retrans_time_ms = 1000
net.ipv4.neigh.lo.ucast_solicit = 3
net.ipv4.neigh.lo.unres_qlen = 31
net.ipv4.neigh.lo.unres_qlen_bytes = 65536
net.ipv4.ping_group_range = 1 0
net.ipv4.route.error_burst = 5000
net.ipv4.route.error_cost = 1000
net.ipv4.route.gc_elasticity = 8
net.ipv4.route.gc_interval = 60
net.ipv4.route.gc_min_interval = 0
net.ipv4.route.gc_min_interval_ms = 500
net.ipv4.route.gc_thresh = -1
net.ipv4.route.gc_timeout = 300
net.ipv4.route.max_size = 2147483647
net.ipv4.route.min_adv_mss = 256
net.ipv4.route.min_pmtu = 552
net.ipv4.route.mtu_expires = 600
net.ipv4.route.redirect_load = 20
net.ipv4.route.redirect_number = 9
net.ipv4.route.redirect_silence = 20480
net.ipv4.tcp_abort_on_overflow = 0
net.ipv4.tcp_adv_win_scale = 1
net.ipv4.tcp_allowed_congestion_control = cubic reno
net.ipv4.tcp_app_win = 31
net.ipv4.tcp_autocorking = 1
net.ipv4.tcp_available_congestion_control = cubic reno
net.ipv4.tcp_base_mss = 1024
net.ipv4.tcp_challenge_ack_limit = 2147483647
net.ipv4.tcp_congestion_control = cubic
net.ipv4.tcp_dsack = 1
net.ipv4.tcp_early_retrans = 3
net.ipv4.tcp_ecn = 2
net.ipv4.tcp_fack = 1
net.ipv4.tcp_fastopen = 0
net.ipv4.tcp_fastopen_key = 00000000-00000000-00000000-00000000
net.ipv4.tcp_fin_timeout = 60
net.ipv4.tcp_frto = 2
net.ipv4.tcp_invalid_ratelimit = 500
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_limit_output_bytes = 262144
net.ipv4.tcp_low_latency = 0
net.ipv4.tcp_max_orphans = 32768
net.ipv4.tcp_max_ssthresh = 0
net.ipv4.tcp_max_syn_backlog = 1280
net.ipv4.tcp_max_tw_buckets = 32768
net.ipv4.tcp_mem = 185361 247148 370722
net.ipv4.tcp_min_tso_segs = 2
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_mtu_probing = 2
net.ipv4.tcp_no_metrics_save = 0
net.ipv4.tcp_notsent_lowat = -1
net.ipv4.tcp_orphan_retries = 0
net.ipv4.tcp_reordering = 3
net.ipv4.tcp_retrans_collapse = 1
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 15
net.ipv4.tcp_rfc1337 = 0
net.ipv4.tcp_rmem = 4096 87380 6291456
net.ipv4.tcp_sack = 1
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_stdurg = 0
net.ipv4.tcp_syn_retries = 6
net.ipv4.tcp_synack_retries = 5
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_thin_dupack = 0
net.ipv4.tcp_thin_linear_timeouts = 0
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_tso_win_divisor = 3
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_wmem = 4096 16384 4194304
net.ipv4.tcp_workaround_signed_windows = 0
net.ipv4.udp_mem = 187218 249624 374436
net.ipv4.udp_rmem_min = 4096
net.ipv4.udp_wmem_min = 4096
net.ipv4.vs.am_droprate = 10
net.ipv4.vs.amemthresh = 1024
net.ipv4.vs.backup_only = 0
net.ipv4.vs.cache_bypass = 0
net.ipv4.vs.conn_reuse_mode = 1
net.ipv4.vs.conntrack = 0
net.ipv4.vs.drop_entry = 0
net.ipv4.vs.drop_packet = 0
net.ipv4.vs.expire_nodest_conn = 0
net.ipv4.vs.expire_quiescent_template = 0
net.ipv4.vs.nat_icmp_send = 0
net.ipv4.vs.pmtu_disc = 1
net.ipv4.vs.secure_tcp = 0
net.ipv4.vs.snat_reroute = 1
net.ipv4.vs.sync_ports = 1
net.ipv4.vs.sync_qlen_max = 61800
net.ipv4.vs.sync_refresh_period = 0
net.ipv4.vs.sync_retries = 0
net.ipv4.vs.sync_sock_size = 0
net.ipv4.vs.sync_threshold = 3 50
net.ipv4.vs.sync_version = 1
net.ipv4.xfrm4_gc_thresh = 32768
net.netfilter.nf_conntrack_acct = 0
net.netfilter.nf_conntrack_buckets = 65536
net.netfilter.nf_conntrack_checksum = 1
net.netfilter.nf_conntrack_count = 7575
net.netfilter.nf_conntrack_dccp_loose = 1
net.netfilter.nf_conntrack_dccp_timeout_closereq = 64
net.netfilter.nf_conntrack_dccp_timeout_closing = 64
net.netfilter.nf_conntrack_dccp_timeout_open = 43200
net.netfilter.nf_conntrack_dccp_timeout_partopen = 480
net.netfilter.nf_conntrack_dccp_timeout_request = 240
net.netfilter.nf_conntrack_dccp_timeout_respond = 480
net.netfilter.nf_conntrack_dccp_timeout_timewait = 240
net.netfilter.nf_conntrack_events = 1
net.netfilter.nf_conntrack_events_retry_timeout = 15
net.netfilter.nf_conntrack_expect_max = 1024
net.netfilter.nf_conntrack_generic_timeout = 600
net.netfilter.nf_conntrack_helper = 1
net.netfilter.nf_conntrack_icmp_timeout = 30
net.netfilter.nf_conntrack_log_invalid = 0
net.netfilter.nf_conntrack_max = 1048576
net.netfilter.nf_conntrack_sctp_timeout_closed = 10
net.netfilter.nf_conntrack_sctp_timeout_cookie_echoed = 3
net.netfilter.nf_conntrack_sctp_timeout_cookie_wait = 3
net.netfilter.nf_conntrack_sctp_timeout_established = 432000
net.netfilter.nf_conntrack_sctp_timeout_heartbeat_acked = 210
net.netfilter.nf_conntrack_sctp_timeout_heartbeat_sent = 30
net.netfilter.nf_conntrack_sctp_timeout_shutdown_ack_sent = 3
net.netfilter.nf_conntrack_sctp_timeout_shutdown_recd = 0
net.netfilter.nf_conntrack_sctp_timeout_shutdown_sent = 0
net.netfilter.nf_conntrack_tcp_be_liberal = 0
net.netfilter.nf_conntrack_tcp_loose = 0
net.netfilter.nf_conntrack_tcp_max_retrans = 3
net.netfilter.nf_conntrack_tcp_timeout_close = 10
net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60
net.netfilter.nf_conntrack_tcp_timeout_established = 3600
net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30
net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300
net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60
net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120
net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300
net.netfilter.nf_conntrack_timestamp = 0
net.netfilter.nf_conntrack_udp_timeout = 30
net.netfilter.nf_conntrack_udp_timeout_stream = 180
net.netfilter.nf_log.0 = NONE
net.netfilter.nf_log.1 = NONE
net.netfilter.nf_log.2 = nfnetlink_log
net.netfilter.nf_log.3 = NONE
net.netfilter.nf_log.4 = NONE
net.netfilter.nf_log.5 = NONE
net.netfilter.nf_log.6 = NONE
net.netfilter.nf_log.7 = NONE
net.netfilter.nf_log.8 = NONE
net.netfilter.nf_log.9 = NONE
net.nf_conntrack_max = 1048576
net.unix.max_dgram_qlen = 512
sunrpc.max_resvport = 1023
sunrpc.min_resvport = 665
sunrpc.nfs_debug = 0x0000
sunrpc.nfsd_debug = 0x0000
sunrpc.nlm_debug = 0x0000
sunrpc.rpc_debug = 0x0000
sunrpc.tcp_fin_timeout = 15
sunrpc.tcp_max_slot_table_entries = 65536
sunrpc.tcp_slot_table_entries = 2
sunrpc.transports = tcp 1048576
sunrpc.transports = udp 32768
sunrpc.transports = tcp-bc 1048576
sunrpc.udp_slot_table_entries = 16
user.max_ipc_namespaces = 31202
user.max_mnt_namespaces = 31202
user.max_net_namespaces = 31202
user.max_pid_namespaces = 31202
user.max_user_namespaces = 0
user.max_uts_namespaces = 31202
vm.admin_reserve_kbytes = 8192
vm.block_dump = 0
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.drop_caches = 0
vm.extfrag_threshold = 500
vm.hugepages_treat_as_movable = 0
vm.hugetlb_shm_group = 0
vm.laptop_mode = 0
vm.legacy_va_layout = 0
vm.lowmem_reserve_ratio = 256 256 32
vm.max_map_count = 65530
vm.memory_failure_early_kill = 0
vm.memory_failure_recovery = 1
vm.min_free_kbytes = 67584
vm.min_slab_ratio = 5
vm.min_unmapped_ratio = 1
vm.mmap_min_addr = 4096
vm.mmap_rnd_bits = 28
vm.mmap_rnd_compat_bits = 8
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
vm.nr_pdflush_threads = 0
vm.numa_zonelist_order = default
vm.oom_dump_tasks = 1
vm.oom_kill_allocating_task = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50
vm.page-cluster = 3
vm.panic_on_oom = 0
vm.percpu_pagelist_fraction = 0
vm.stat_interval = 1
vm.swappiness = 60
vm.user_reserve_kbytes = 131072
vm.vfs_cache_pressure = 100
vm.zone_reclaim_mode = 0

hostnamectl

Icon name: computer-vm
Chassis: vm
Virtualization: vmware
Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
Kernel: Linux 3.10.0-862.11.6.el7.x86_64
Architecture: x86-64

Is the CPU spend in haproxy, another application or the kernel? Why is it swapping? How much RAM do you have, and how much is used in haproxy?

Please provide the output of haproxy -vv.

I’m not sure why you’d configure maxconnrate and maxsessrate at all? That will slow down the amount of new connections that can be established. And it seems that is one of the problems you are facing.

Also, you have huge timeouts (24h). That is a very bad idea, because connections will take that long to timeout.

VM has 8Gb RAM int total.

I am thinking to:

  1. do not allow to use swap.
  2. reduce client timeout 24h
  3. reduce net.ipv4.tcp_keepalive_time

What I have fond is that FIN_WAIT2 are dominating and latest file in /proc//fd is about ~2h

maxconnrate and maxsessrate was configured by intend to protect the servers. Seems like new connections are being added to backlog… The rate limit actually never get hit,

Summary

haproxy -vv

HA-Proxy version 1.5.18 2016/05/10
Copyright 2000-2016 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -DTCP_USER_TIMEOUT=18
OPTIONS = USE_LINUX_TPROXY=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.7
Compression algorithms supported : identity, deflate, gzip
Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

I suggest to lower both timeout client and timeout server. What protocol are you forwarding? I assume it’s not HTTP?

From the admin socket, can you provide the output of:

  • show info
  • show pools
  • show stat
1 Like

There mix of protocos, everything works in tcp mode. some are web sockets.

show info

echo “show info” | socat unix-connect:/var/run/haproxy.sock stdio

Name: HAProxy
Version: 1.5.18
Release_date: 2016/05/10
Nbproc: 1
Process_num: 1
Pid: 26690
Uptime: 2d 0h29m13s
Uptime_sec: 174553
Memmax_MB: 0
Ulimit-n: 520810
Maxsock: 520810
Maxconn: 260000
Hard_maxconn: 260000
CurrConns: 936
CumConns: 415383
CumReq: 355230
MaxSslConns: 0
CurrSslConns: 0
CumSslConns: 139553
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 0
ConnRateLimit: 280
MaxConnRate: 280
SessRate: 0
SessRateLimit: 280
MaxSessRate: 280
SslRate: 0
SslRateLimit: 0
MaxSslRate: 8
SslFrontendKeyRate: 0
SslFrontendMaxKeyRate: 4
SslFrontendSessionReuse_pct: 0
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 0
SslCacheLookups: 139553
SslCacheMisses: 9
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
ZlibMemUsage: 0
MaxZlibMemUsage: 0
Tasks: 2462
Run_queue: 1
Idle_pct: 100
node: lb1
description:

show pools

echo “show pools” | socat unix-connect:/var/run/haproxy.sock stdio

Dumping pools usage. Use SIGQUIT to flush them.

  • Pool pipe (32 bytes) : 8 allocated (256 bytes), 8 used, 3 users [SHARED]
  • Pool capture (64 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
  • Pool channel (80 bytes) : 37302 allocated (2984160 bytes), 1878 used, 1 users [SHARED]
  • Pool task (112 bytes) : 20174 allocated (2259488 bytes), 2462 used, 1 users [SHARED]
  • Pool uniqueid (128 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
  • Pool sticktables (144 bytes) : 2 allocated (288 bytes), 0 used, 3 users [SHARED]
  • Pool connection (320 bytes) : 37302 allocated (11936640 bytes), 1878 used, 1 users [SHARED]
  • Pool hdr_idx (416 bytes) : 2 allocated (832 bytes), 0 used, 1 users [SHARED]
  • Pool session (864 bytes) : 18651 allocated (16114464 bytes), 939 used, 1 users [SHARED]
  • Pool requri (1024 bytes) : 0 allocated (0 bytes), 0 used, 1 users [SHARED]
  • Pool buffer (16416 bytes) : 37302 allocated (612349632 bytes), 1878 used, 1 users [SHARED]
    Total: 11 pools, 645645760 bytes allocated, 32667744 used.
show stat
1 Like

The configuration you posed does not match what you are currently running. Please provide the actual running configuration.

First of all, you only have 8 GB of RAM, you must no raise your global maxconn much over 150000, otherwise you will run out of RAM, which seem to happen in your configuration (give that you really use 280000 maxconn, as opposed to the 100000 that your configuration mentions.

I suggest:

  • provide the actual configuration you run with
  • either limit global maxconn to 150000 or increase your RAM - its about 20000 per GB of RAM
  • remove maxconnrate and maxsessrate, have haproxy and the kernel handle this with the maxconn configuration
  • use timeouts that actual reflect your needs, also specify much smaller timeouts for timeout client-fin and timeout server-fin

Thanks, I am going to reduce timeout client ( 24h) and timeout server (60m) to 5m both ( expecting that long live connections sends ping-pong it should not be a problem to reduce it to 5m) .
If I don’t specify timeout client-fin, timeout server-fin will it use setting from the kernel (nf_conntrack_tcp_timeout_fin_wait ) ? do you recommend to specify the client-fin timeout explicitly to let say 30s ?
The configuration is actual the only difference is that it is 280000 maxconn instead of 100000
Why are you suggesting to remove maxconnrate? it was introduced to to protect the services.

No, that’s not about FIN_WAIT. It’s the timeout for half-closed connections. client-fin will default to timeout client and server-fin will default to timeout tunnel, if not configured.

Yes, that’s what I’m recommending.

You first of all have to make sure your maxconn configuration matches the actually available RAM, this will avoid swapping and probably reduce CPU usage. Second of all fixing timeouts will make sure you don’t have tens of thousands of connections idling around, occupying lots of RAM.

maxconnrate and maxsessrate will only slow down new requests, which is the wrong thing to do, because a) you want to fix your 50 second delay problem, right? and b) because when you don’t accept connections the just will pile up in the queue and either get dropped or get delayed.

Also, I still don’t know how maxconn is actually configured across the front and backends. Please provide the actual configuration, as I cannot continue to guess what you configured. For example, share the actual configuration of your busiest frontend and backend please.

Adding haproxy.cfg, I selected only one frontend and a backend the rest of them are identical but and almost all of them has relation - 1 frontend with 1 backend

haproxy.cfg

global
log /dev/log local0
log /dev/log local1 notice

daemon
user haproxy
group haproxy
maxconnrate 280
maxsessrate 280
maxconn 260000

stats socket /var/run/haproxy.sock level admin

defaults
mode tcp
log global
option dontlognull
timeout connect 5s
timeout client 24h
timeout server 60m
maxconn 260000

peers lbs
peer lb1a 1.9.100.1:5555
peer lb2a 1.9.100.2:5555
peer lb3a 1.9.100.3:5555
peer lb1b 1.8.100.1:5555
peer lb2b 1.8.100.2:5555
peer lb3b 1.8.100.3:5555

frontend servers_1234
bind 100.100.100.1:1234
acl p1234 dst_port 1234
use_backend servers_1234_1234 if p1234

backend servers_1234_1234
balance leastconn
option independant-streams
server serverB_vir1 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.67
server serverB_vir2 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.90
server serverB_vir3 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.99
server serverB_vir4 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.100
server serverB_vir5 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.101
server serverB_vir6 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.102
server serverB_vir7 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.103
server serverB_vir8 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.104
server serverB_vir9 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.91
server serverB_vir10 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.92
server serverB_vir11 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.93
server serverB_vir12 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.94
server serverB_vir13 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.95
server serverB_vir14 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.96
server serverB_vir15 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.97
server serverB_vir16 server-b:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.98
server serverC_vir1 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.67
server serverC_vir2 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.90
server serverC_vir3 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.99
server serverC_vir4 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.100
server serverC_vir5 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.101
server serverC_vir6 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.102
server serverC_vir7 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.103
server serverC_vir8 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.104
server serverC_vir9 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.91
server serverC_vir10 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.92
server serverC_vir11 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.93
server serverC_vir12 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.94
server serverC_vir13 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.95
server serverC_vir14 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.96
server serverC_vir15 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.97
server serverC_vir16 server-c:1234 maxconn 100000 on-marked-down shutdown-sessions check fall 3 rise 2 inter 10s slowstart 200s source 1.1.1.98

I am looking for a way to protect my proxy from being overloaded by configuring linux to drop the connections after limit is reached so client will fail fast and retry on another load balancer ( we have 3 in a region ) For this I am thinking to reduce backlog.

Just configure maxconn correctly as per your RAM availability and then configure timeout queue to the value that you want.

I end up with setting just to better understand tcp stack
timeout client 24h
timeout server 60m
timeout client-fin 2m
timeout server-fin 2m

the problem is that: after server close the connection then client respond just after 100 seconds with its FIN and RST, and that was a problem which was solved by increasing close_wait timer on netfilter to 120 sec so linux doesn’t filter out this connection. but I still have problem with this connection. The problem is that the connection stays in memory even after server closed the connection and timed out on server’s side ( so server doesn’t have TIME_WAIT2 on his side ) and client sent RST after 100 second, but the connection to the client is still in TIME_WAIT2 state. So I end up having 2 connections, 1 TIME_WAIT2 to the client and one to the server with CLOSE_WAIT ( however server doesn’t have TIME_WAIT2 for this connection ) these two connections will be cleaned up in 24h as it is in timeout client, but I don’t understand why is like that, why connection not closed after client sent RST ( even it is 100 sec after LB’s FIN ) ( can it be filtered out by kernel ? ) and why client-fin didn’t kick in.

The solution would be for me to just set timeout client 5m. But I would like to understand why is that the connection that is seems to be in half-open state to client closed only by expiration of “timeout client”
so, why : session not closed after RST sent by the client, second not closed after client-fin 2m expires.

LB - load balancer
1.
Client ( established ) - LB ( established )
LB ( established ) - server ( established )
2. Exchanging data
3. 200 sec idle ( client doesn’t respond )
4. Server sends “FIN”
Server ( TIME_WAIT2 ) - LB ( CLOSE_WAIT )
LB sends FIN to Client
LB ( TIME_WAIT2 ) - Client ( unknown )
5. 100 sec of waiting for response from client to FIN
Client sends FIN followed by RST
Client ( unknown ) - LB ( TIME_WAIT2 )
LB sends FIN to the server
Server ( closed ) - LB ( CLOSE_WAIT)
6.
from that moment I see session in “show sess” that will expire in 24h
and in netstats two connections

  1. lb client FIN_WAIT2
  2. lb server CLOSE_WAIT

I think conntrack is causing this mess, and I can’t help you troubleshooting conntrack timers. Try confirming this is related to conntrack by disabling (or bypassing it for certain IPs), and if confirmed that conntrack is causing this, get appropriate help.

I also doubt that conntrack is sufficiently tuned to handle 260000 connections, so you probably want to take a look at the scale issues here also.

Check multi thread config.
exemple:
global
nbproc 8
cpu-map 1 0
cpu-map 2 1
cpu-map 3 2
cpu-map 4 3
cpu-map 5 4
cpu-map 6 5
cpu-map 7 6
cpu-map 8 7
stats bind-process 8