The 'grace' is deprecated and will be removed in a future version

Hi,

I’ve recently upgraded to a newer version of Haproxy and was hit by this warning:

The ‘grace’ is deprecated and will be removed in a future version

When researching I stumbled upon this inthe release notes:

  • The grace directive has been marked as deprecated and is scheduled tentatively for removal in 2.4 with a hard deadline of 2.5. It was meant to postpone stopping of a process during a soft-stop, but is incompatible with soft reloading. We suspect that it is not widely used, but the warning will help us to know if some specific uses remain.

I guess we’re one of those users of this specific function. We’re using Haproxy in a dockerized environment so we do not reload Haproxy, we replace it, and when a node goes down for maintenance it takes some time to bring it down so we use grace. This is our use-case:

listen fe_ingress_stats
  bind *:8887
  mode http
  stats enable
  stats show-legends
  stats show-node
  stats uri /

  acl going_down stopping
  grace 10s
  monitor-uri /healthy
  monitor fail if going_down

It is my understanding that grace will keep the listen alive for 10 seconds after SIGUSR1 has been issued, but the monitor uri will return failing if the process is stopping.

We then have our loadbalancer listening on /healthy like this:

  # Haproxy dashboard answers on /healthy on port 8887
  health_check {
    healthy_threshold   = 2
    unhealthy_threshold = 2
    timeout             = 3
    target              = "HTTP:8987/healthy"
    interval            = 5
  }

Are there any way I can achieve the same without grace?

(Original thread: Using `stopping` variable from internal state together with `monitor fail`)

What actual problem does this configuration solve? I fail to understand it.

1 Like

Sorry for not explaining properly.

Our current configuration looks like this: AWS ELB -> Haproxy -> Application servers.

The ELBs are configured to look for /healthy as a healthcheck.

On deployment of new a new Haproxy loadbalancer version we first send SIGUSR1 to the container, then we wait 12 seconds, and then the container is removed.

This allows for ELBs to reconfigure traffic, but allow current requests to finish.

It is my understanding that without the grace period configured the Haproxy process would just disappear, the grace period allowed for loadbalancers to understand the backend was going down.

The reason for this dance is that AWS ELBs use dumb and simple healthchecks, they only support healthchecks every 5 seconds and need 2 unhealthy checks to mark a node as down, so grace helps with us emitting an early warning to AWS ELB to take the node out of rotation early, so there are no pending connections and no new traffic sent its way.

That’s not true, soft-stop (SIGUSR1) is what instructs haproxy to not kill existing TCP connections immediately.

This should be unnecessary. Soft-stop without grace will close the LISTENING socket immediately, so the load-balancer will understand that haproxy is going down. You don’t need a listening socket for that, and you don’t need a negative health check response (monitor fail if going_down) when a closed socket is even more telling for the LB health check.

Please just remove all the monitor-* and grace configurations from your configuration and actually test it. You will see that the behavior is the same.

Hi Lukas,

Thanks for your response and time,

I will have to test this out, so bear with me. The grace option was adopted when I saw connectivity issues when only using SIGUSR1, these are the comments from the linked thread:

I’m having some issues with intermittent connection issues because of a loadbalancer that doesn’t update healthchecks too frequently, I’ve been using the Haproxy stats page in my healthchecks.

I thought I would be able to send SIGUSR1 to Haproxy to shut down gracefully, but it seems like it doesn’t pick up quickly enough

Assumptions:

  • AWS ELB sends a healthcheck every 5 seconds.
  • AWS ELB needs 2 failed healthchecks to mark a node as unhealthy.
  • AWS ELB will not retry or redispatch traffic of connection is failing.

A timeline without the grace 10s option:

00:00: Haproxy receives SIGUSR1 and closes LISTENING socket.
00:05: AWS ELB tests the /healthy endpoint, notices it is down.
00:10: AWS ELB tests the /healthy endpoint, notices it is down.
00:10: Haproxy have failed the healthcheck twice so the node is removed from rotation.

In this example AWS ELB will have been sending new traffic to the Haproxy server, which means that some of the traffic will have been lost for at least 10 seconds.

This is my assumption with the grace 10s option:

00:00: Haproxy receives SIGUSR1 and keeps LISTENING socket open for 10s.
00:05: AWS ELB tests the /healthy endpoint, notices it is down.
00:10: AWS ELB tests the /healthy endpoint, notices it is down.
00:10: Haproxy have failed the healthcheck twice so the node is removed from rotation.
00:12: Our scheduler kills the Haproxy container.

Because of grace 10s all the the traffic that was sent towards the node during the 10 seconds would still have been handled until the process have been removed from the rotation.

What am I missing?

Ah ok, that’s what I was missing.

Well yes indeed grace makes sense here. I can think of workarounds, but that’s not the point, the point is to understand use-cases.

Could you file a feature request on github (although strictly speaking it’s not a feature request, but it’s not a bug either). Explain your use-case like in this thread and emphasize that the upstream Amazon load-balancer does not redispatch failed connections, which is why the sockets need to remain in listening until the Amazon load-balancer detects the failure and stops sending traffic here.