Debug TLS session resumption & high CPU usage


Our HAProxy instance was under heavy load (32 threads and CPU usage was 3000+ for most of the time) and we suspected that it could be due to our clients not using TLS session resumption. After fixing the client-side and setting TLS session lifetime (tune.ssl.lifetime) to 1 day and increasing the cache size to 240 MB (20K clients * 200 bytes per entry = 4 MB << 240 MB), the CPU load reduced dramatically. However, we still see periodic (multiple times in a day) CPU spikes and high “%Tq” in the logs for some of the requests. We’re suspecting that TLS session resumption isn’t happening and that’s why CPU is spiking and hence high “%Tq”.

However, we couldn’t say for certain that TLS renegotiation or a new handshake happened for those requests since I don’t see a config to log that. Also, we don’t know why a renegotiation happened, if it did, before the lifetime (1 day) expired.

What would be the best way to debug this scenario? Any advice or help is highly appreciated.

HAProxy version: 1.8.20
Openssl version: 1.0.2k


For future readers, the following link was helpful to us.

We enhanced the HAProxy log format to include this information and narrowed down on the clients, which didn’t do TLS session resumption.