Can HAProxy affect the internal latency at service server?

I made a HAproxy server connected to 10 web-server and test its performance because my web-service is very sensitive to latency. I measured lots of things related with performance.
I compare two case.
[1] the request is sent to web-server directly.
[2] the request is sent to web-server through HAProxy server.

What I noticed first is internal latency which is the elapsed time from getting request to sending response.
Compared to [1], the internal latency is smaller at [2].
It looks really strange because the web server logic at two case doesn’t change at all.
I have tried to debug it and why this happens since last month but couldn’t find any reason
I have tried to find where the internal latency difference is from and it was making goroutine code.
At making go routine code(it makes 4~5 go routine at the code), it takes smaller time for [2] to execute, compared to [1]

But I really don’t know why it happens. It is not related with network because it is internal.
I guess HAProxy saved resource from reuse sockets or reuse connection and it makes system efficient and the internal latency smaller.
So I measured the internal latency with no option http-server-close or no option http-keep-alived or something similar.
But the result was same with the before. [2]'s internal latency is always smaller than [1]

Could you give me some advices what this happens?
My HAProxy version is HA-Proxy version 1.5.18 2016/05/10
and configuration is below.

global
    log         127.0.0.1 local2
 
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     40000
    user        haproxy
    group       haproxy
    daemon
 
    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats level admin
 
    nbproc 4
    cpu-map 1 0
    cpu-map 2 1
    cpu-map 3 2
    cpu-map 4 3
    stats bind-process 4


defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
#    option http-keep-alive
#    no option http-keep-alive
#    option http-no-delay
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    100s
    timeout queue           10m
    timeout connect         100s
    timeout client          100m
    timeout server          100m
    timeout http-keep-alive 100s
    timeout check           100s
    maxconn                 30000
 
frontend  main *:8080
    mode http
 
    bind-process 1 2 3
    default_backend             app
 
frontend stats *:9000
    stats enable
    stats uri /haproxy_stats

backend app
    mode http
    balance     roundrobin
    option httpchk GET /nginx_status
    option httplog
    server  app1 10.131.50.203:8080 check
    server  app2 10.131.50.10:8080 check
    server  app3 10.131.51.12:8080 check
    server  app4 10.131.50.14:8080 check
    server  app5 10.131.50.204:8080 check
    server  app6 10.131.50.78:8080 check
    server  app7 10.131.50.11:8080 check
    server  app8 10.131.49.78:8080 check
    server  app9 10.131.48.12:8080 check
    server  app10 10.131.48.75:8080 check

You’d have to explain at the very least:

  • what benchmark tool you are using
  • the * exact and complete* benchmarking configuration
  • exact and complete benchmarking result
1 Like

I appreciate your help :+1: :bowing_man:

  • The benchmark tool I used is nGrinder

    • It do stress test by written script.
    • I made a script sending http request to each server with setting TPS
    • It report the result of test, including the network latency, the total errors and so on.
  • The benchmark configuration is below

    • Experiment topology
      • 1st topology is [HWLB(L3DSR mode)] - [10 Servers]
      • 2nd topology is [HWLB(L3DSR mode)] - [HAProxy Server] - [10 Servers]
    • At these topology, I send http request about 1000TPS to HWLB by nGrinder, and then each server get requests at about 100TPS from HWLB or HAProxy.
    • At each server, There are nginx and our web server made by Go.
    • HAProxy Server and 10 Server specification is Xeon Silver 4210 (2.2GHz/10core)*2, 64GB RAM, 25G NIC
    • Measured metrics
      • CPU and memory usage : by linux tool such as htop
      • The internal latency : by time package at GO and collected by Prometheus
        • start stamp : right after getting requests. starting time our service logic
        • end stamp : right before sending requests. ending time our service logic
  • The benchmark result is below. I get below result repeatedly.

    • Requests rates graph shows each server gets 100 TPS on average.
    • Latency p99 graph at the left request rates shows using only HWLB case has a little higher latency. Some spikes is from API delay where our service server accesses external servers.
    • Latency p99 graph below is measured at the code making 3~4 Goroutine. using only HWLB case has a little higher elapsed time at this code.
    • CPU and Memory usage is not bottleneck because its usage is below 20%. NIC usage is too.
    • Because the internal latency difference is from making Goroutine code, I check Go GC duration counts, Go GC duration and number of Goroutine at each test case. But they were same.
    • But some metrics related with Go were different like Go heap/stack memory usage, the number of fd and so on.

I think it is really strange because the internal logics in service server are never changed. The only change is using HAProxy. I don’t understand my test result. But I try to find the answer.