Sticky sessions for tomcat jvmRoute?

Folks,

we just introduced haproxy to replace apache2 providing reverse proxy / load balancing across a couple of tomcat servers hosting the same application, and we need persistent sessions for users. So far, however, even though anything else works way faster in regular operations, we more often see users complaining about the application complaining about “lost sessions”, which is caused by the request being routed to the “wrong” backend. Usually this happens as soon as we brought down and restarted one of the instances. I am not really sure what’s causing this trouble so far but noticed one weirdness in cookies.

Currently our backend looks similar to this:

backend app
   balance leastconn
   mode http
   cookie JSESSIONID prefix indirect nocache
   server srv16060 localhost:16060 cookie srv16060 check maxconn 1000
   server srv17070 localhost:17070 cookie srv17070 check maxconn 1000

I omitted a bunch of server lines as they all basically look the same. I see, by now, cookies like these added to the request:

"srv16060~srv160601j3jy5h4ptzerl68anyg1rn8z.srv16060"

This, however, is the session cookie already provided by the backend system itself, thanks to tomcat / jvmRoute configuration.

"srv160601j3jy5h4ptzerl68anyg1rn8z.srv16060"

The name of the instance (srv16060) in this case always is suffixed to the cookie. Unsure whether this is actually a problem, but: Ain’t there a way to set up haproxy to evaluate these backend cookies and figure out which server it belongs to by information provided in the cookie itself, i.e. make sure all cookies having “.srv16060” suffix in it will go to the srv16060 server? Pardon me if this is a trivial question - looking at the “prefix” option in the cookies directive I was hoping something like this is already available out of the box but so far I failed to set it up…

TIA and all the best,
Kristian

I’m not sure what’s wrong.

If you are saying this only happens when you restart an instance (I assume you mean a backend server by that), that indicates that stickiness actually works fine. And when you do restart an instance, the question is if only sessions from that particular instance are affected and what the behavior is that you expect from haproxy in a backend-down situation (fail hard, redistribute to other backends, which of course means losing the session)?

How did Apache handle the sessions when a backend server went down?

To get a better picture, we would need your global and default settings, especially keep-alive modes, etc. The exact haproxy release or better yet the output of haproxy -vv is also required.

Hi there, and first off, thanks a bunch for your comments. I did dig a bit deeper into what happens, and still am not completely sure why it happens. The “problem” generally occurs whenever a backend goes down and gets sessions with an old session cookie routed there. The app generally has two cookies, a global SSO cookie and a session cookie associated with / tied to the current backend instance.

Trying to reproduce this, I logged into the application (one instance), restarted the proxy and tried to reload the page.

  • Running the application standalone or with apache2 reverse proxy: Reloading will cause the application to create a new session, a new session cookie for that user (using the SSO information and so assuming the user has been authenticated), and I can mostly work on well.

  • Running the application with haproxy 1.4.x in front of it: … will show exactly the same behaviour as apache2 or standalone - doing a “reload” after restarting the instance will cause the application to create a new session, a new session cookie and all is fine.

  • Running the application behind haproxy 1.6 or 1.7: … will cause the application to not create a new session cookie but instead display a “Session Expired” message which, and this is worst, I cannot really get rid of, not even by re-logging in to the application; only way to resolve this is to manually remove the JSESSIONID cookie.

I am not sure this makes sense, but whatever is “the issue” here is something that hasn’t been around in haproxy 1.4 but in haproxy 1.6+.

haproxy -vv:

HA-Proxy version 1.7.1-1ppa1~trusty 2016/12/15
Copyright 2000-2016 Willy Tarreau willy@haproxy.org

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2
OPTIONS = USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_NS=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Encrypted password support via crypt(3): yes
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity(“identity”), deflate(“deflate”), raw-deflate(“deflate”), gzip(“gzip”)
Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports prefer-server-ciphers : yes
Built with PCRE version : 8.31 2012-07-06
Running on PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with network namespace support

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[COMP] compression
[TRACE] trace
[SPOE] spoe

haproxy.cfg:

global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon

defaults
log global
mode http
option httplog
option dontlognull
contimeout 5000
clitimeout 50000
srvtimeout 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http

backend srv20
balance leastconn
mode http
cookie JSESSIONID prefix indirect nocache
server srv10080 localhost:10080 cookie srv10080 check backup
server srv10090 localhost:10090 cookie srv10090 check backup

frontend http
bind *:90
mode http

default_backend srv20

Haproxy 1.4 in this configuration defaults to tunnel-mode. The equivalent configuration in 1.6/1.7 would be “option http-tunnel” [1] in the default section.

However, don’t do this - it would lead to more brokenness.

I don’t understand this. If a backend server is down, how can it receive a request with a cookie. And what do you mean by old? I don’t know nothing about tomcat instances, maybe you can elaborate what happens on your backend exactly?

Does the backend instance “respawn” without the session information from the previous instance, but with the same name?

[1] HAProxy version 1.6.16 - Configuration Manual

Hi there, and thanks a bunch for your continuing support.

Ok, let’s see:

  • There an SSO service the user logs in to initially, and there’s an SSO cookie in the request (if the user managed to log in) for each application to work with.
  • The backend application has a session id / JSESSIONID cookie to identify user sessions. These are instance specific, there is no shared session store (not needed so far).
  • If an incoming request has an SSO cookie and a JSESSIONID known to that particular instance, it’s obviously a valid active session.
  • If an incoming request has an SSO cookie but no JSESSIONID, the application will consider this a new session and create a new JSESSIONID cookie.
  • If an incoming request has an SSO cookie and a JSESSIONID cookie which is unknown to the backend instance, the instance will consider the user session “new”, create a new session and (effectively) overwrite the existing JSESSIONID cookie with its own “new” JSESSIONID (this is how failover worked on apache2 - one instance goes down, requests go to a different backend, users get a new session there mostly transparently; the application is “stateless enough” to allow for this).

Problem, here, in haproxy 1.6+, is that the last of these steps doesn’t work. The SSO cookie is still there. The JSESSIONID cookie remains somehow “persistent”, all requests will go to the (restarted) backend instance of the same name, the backend instance continuously will try to create a new session / JSESSIONID for that user - but apparently and for whichever reason this doesn’t work. There “seems” some reason why the incoming session for that user (client? …?) doesn’t “see” the “new” JSESSIONID cookie but repeatedly keeps posting the “old” one (that was valid on the backend instance before it was restarted).

Yes. There are four backend instances in the production environment, each of them with its own session store which isn’t persisted and lost upon restart. So far this hasn’t been much of a problem, the application allows for this kind of workflow at the moment as there’s little real user state in these sessions.

Thanks for explaining, I have a better picture of the situation now.

One thing I can think of is that your backend may be broken and only read from the first request in a session.
Try setting “option http-server-close” in the default section to see if that changes anything. Remove it afterwards.

I also don’t particularly like the rewrite and prefix cookie modes, they mess with the application cookies which I don’t like generally.

Try a dedicated haproxy cookie by replacing the cookie configuration with this:

cookie SRV insert indirect nocache

But please be advised that all clients will loose their sessions once you switch to this mode.

Thanks a bunch. All the three options (http-tunnel, http-server-close, adding a dedicated cookie) indeed seem to resolve the issue at least in my testbed environment. So by now I did set up the production system too to use the dedicated cookie approach, let’s see how this will work out. Maybe I’ll report back in case of any issues.

Thanks again for your support, best regards!