Exchange 2013, random disconnects in Outlook 2016 for Mac


#1

Hi community,

I’ve been scratching my head with this problem for far too long now. The thing is, everything works fine with HAp and Exchange 2013 EXCEPT Outlook (2016) for MAC. In other words, the EWS protocol in Exchange seem to have problems with my config file and I can’t figure out why.

If I start Outlook for Mac it works just fine for a minute or two, but after that I just get disconnected from the Exchange server. I then get automatically connected again for a while, and the same thing happens over and over again. Very frustrating. (Windows Outlook works just fine, no problems there).

I’ve been looking at haproxy.log, but at the time of the disconnect there’s just no (new) information to be found there. In other words, everything looks normal (I just get disconnected for some reason) :frowning:

I’ve been playing with timeout client and timeout server options to no avail. Could someone plz take a look at the following configuration file and tell me some good advice, thanks!

global  
 log         127.0.0.1 local2 info
 chroot      /var/lib/haproxy
 pidfile     /var/run/haproxy.pid
 maxconn     100000
 user        haproxy
 group       haproxy
 daemon

 ssl-default-bind-options no-sslv3
 ssl-default-bind-ciphers  ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
 ssl-default-server-options no-sslv3
 ssl-default-server-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS
 tune.ssl.default-dh-param 2048

# turn on stats unix socket
stats socket /var/run/haproxy.stat

defaults
 mode                   http
 log                       global
 option                  httplog
 option                  dontlognull
 #option 	        http-server-close
 option 			forwardfor       except 127.0.0.0/8
 option                  redispatch
 #option		    	contstats 
 retries                  3
 timeout http-request    10s
 timeout queue           1m
 timeout connect         4s
 #timeout client         2m
 timeout client          1000s
 #timeout server         1m
 timeout server          1000s
 timeout http-keep-alive 10s
 timeout check           10s

listen stats x.x.x.x:444  # VIP-IP
    stats enable
    stats refresh 300s
    stats show-node
    stats auth xxxx:xxxx
stats hide-version
    stats uri  /stats

frontend fe_ex2013
 # http-response set-header Strict-Transport-Security max-age=31536000;\ includeSubdomains;\ preload
 http-response set-header X-Frame-Options SAMEORIGIN
 http-response set-header X-Content-Type-Options nosniff
 mode http
 bind *:80
 bind *:443 ssl crt /etc/ssl/certs/exchange_certificate_and_key_nopassword.pem
 redirect scheme https code 301 if !{ ssl_fc }   ## redirect 80 -> 443 (for owa)
 acl autodiscover url_beg /Autodiscover
 acl autodiscover url_beg /autodiscover
 acl mapi url_beg /mapi
 acl rpc url_beg /rpc
 acl owa url_beg /owa
 acl eas url_beg /Microsoft-Server-ActiveSync
 acl ecp url_beg /ecp
 acl ews url_beg /EWS
 acl oab url_beg /OAB
 use_backend be_ex2013_autodiscover if autodiscover
 use_backend be_ex2013_mapi if mapi
 use_backend be_ex2013_rpc if rpc
 use_backend be_ex2013_owa if owa
 use_backend be_ex2013_eas if eas
 use_backend be_ex2013_ecp if ecp
 use_backend be_ex2013_ews if ews
 use_backend be_ex2013_oab if oab
 default_backend be_ex2013


backend be_ex2013_autodiscover
 mode http
 balance roundrobin
 option httpchk GET /autodiscover/healthcheck.htm
 option log-health-checks
 http-check expect status 200
 server ex1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
 server ex2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt

backend be_ex2013_mapi
 mode http
 balance roundrobin
 option httpchk GET /mapi/healthcheck.htm
 option log-health-checks
 http-check expect status 200
 server ex1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
 server ex2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_rpc
 mode http
 balance roundrobin
 option httpchk GET /rpc/healthcheck.htm
 option log-health-checks
 http-check expect status 200
 server ex1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
 server ex2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_owa
 mode http
 balance roundrobin
 option httpchk GET /owa/healthcheck.htm
 option log-health-checks
 http-check expect status 200
 server ex1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
 server ex2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_eas
 mode http
 balance roundrobin
 option httpchk GET /microsoft-server-activesync/healthcheck.htm
 option log-health-checks
 http-check expect status 200
 server ex1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
 server ex2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_ecp
 mode http
 balance roundrobin
 option httpchk GET /ecp/healthcheck.htm
 option log-health-checks
 http-check expect status 200
 server ex1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
 server ex2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_ews
 mode http
 balance roundrobin
 option httpchk GET /ews/healthcheck.htm
 option log-health-checks
 http-check expect status 200
 server ex1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
 server ex2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013_oab
 mode http
 balance roundrobin
 option httpchk GET /oab/healthcheck.htm
 option log-health-checks
 http-check expect status 200
 server ex1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
 server ex2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt


backend be_ex2013
 mode http
 balance roundrobin
 server ex1 1.1.1.1:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt
 server ex2 2.2.2.2:443 check ssl inter 15s verify required ca-file /etc/ssl/certs/ca-bundle.crt



#################
# SMTP and IMAP #
#################

frontend fe_exchange_smtp
 mode tcp
 option tcplog
 bind x.x.x.x:25 name smtp  # VIP-IP, port not open to the public internet, only against Postfix
 default_backend be_exchange_smtp

backend be_exchange_smtp
 mode tcp
 option tcplog
 balance roundrobin
 server ex1 1.1.1.1:25 weight 10 check
 server ex2 2.2.2.2:25 weight 20 check

### No need to Load Balance port 587 and 465. Postfix handles these.


frontend fe_exchange_imaps
 mode tcp
 option tcplog
 #   bind x.x.x.x:143 name imap  ### Not allowing unencrypted imap.
 bind x.x.x.x:993 name imaps ### VIP-IP
 default_backend be_exchange_imaps

backend be_exchange_imaps
 mode tcp
 option tcplog
 #balance roundrobin
 balance leastconn
 #stick store-request src
 #stick-table type ip size 200k expire 30m
 server ex1 1.1.1.1:993 weight 10 check
 server ex2 2.2.2.2:993 weight 20 check

#2

two things jump out to me, I usually find that I need “option accept-invalid-http-request” when load balancing Exchange with HAProxy. You can see here that for Exchange 2010 I suggested it with success:


Obviously Microsoft have been getting better… However, I’d still double check you are not seeing “req” errors on your state page and that :
echo "show errors" | socat unix-connect:/var/run/haproxy.stat stdio
Shows no errors either.

I too often end up raising the client / server timeout’s which in the past(Again mostly Exch 2010) I’ve needed to raise up to 45 mins or more, it may be worth a try as a test if nothing else…


#3

Thanks for your fast answer. (You actually answered when I was in the process of reformatting my ugly post :grinning:). Should at least be a more human readable config file now…

To begin with, I’ll have a look at the echo “show errors” | socat unix-connect:/var/run/haproxy.stat stdio command and also try with a higher timeout.


#4

Alright, I did some tests.

Adding the option accept-invalid-http-request didn’t work, I’m still getting disconnected. (Assuming this command should be put in the frontend fe_ex2013 section. Should it? And does it matter in which order?)

echo “show errors” | socat unix-connect:/var/run/haproxy.stat stdio shows
Total events captured on [04/May/2018:08:25:45.536] : 0 :frowning:

Playing with client / server timeouts had no effect either. Sigh.

Btw, what do you mean by “However, I’d still double check you are not seeing “req” errors on your state page”? State page? I’m assuming you’re not talking about the stats page, as I’ve not seen any error messages on that page…

(My version is: HA-Proxy version 1.5.18 2016/05/10)


#5

You’re sure the problem is with HAProxy and not a firewall or anti-virus in between your systems?


#6

We’re currently using Zen Load Balancer without problems, so yes, I’m quite sure. (Also, Windows Outlook is working just fine).


#7

Hello,

first of all, remove option dontlognull then checkout the haproxy log output again and share (the relevant macOS logs).

Because you didn’t configure a maxconn value for the frontend, so it defaults to 2000. There is no real backend maxconn (relevant to this configuration) and you also did not specify any maxconn values for the servers.

maxconn in the global section sets the per process maxconn value. Frontend and server maxconn are distinct values with there own defaults.


#8

I removed option dontlognull, and there’s absolutely no difference in the logs after this change.

(I actually deleted the note about maxconn, as I got it working already. The maxconn should be in the defaults section, not global…)

Here’s the log attached:

Somewhere in (the middle?) there I got disconnected, and I then reconnected myself by pressing “send/recieve” all folders. Actually I noticed that this method works every time for “manually” connecting myself after a disconnect. Needless to say, not an ideal method.

Another thing that got me thinking. I’m using the hosts-file for “fake dns” at the moment. Otherwise I can’t test HAP in production (Zen is up 'n running in production right now). I’ve just added autodiscover and the exchange namespace to the hostfile. Is this approach acceptable for testing or could there be something “leaking” over from the current LB (Zen). Something cached?

I’ll again point out that Windows is happy testing HAP this way (editing the hosts file).


#9

I’ve read somewhere that emptying the cache on the Mac Outlook could fix some problems. Well, I tried that.

BUT now, a short while after emptying the cache, I’m having even worse problems :frowning: Now I can’t even connect to Exchange anymore, Outlook just asks for credentials at startup (which it didn’t before I emptied the cache). Then, even though I enter the correct credentials, I can’t connect. I just get an error message stating “Mail could not be received at this time”. The server for account xxxxx returned an error. Logon failure: unknown username or bad password. Your username/password or security settings may be incorrect. Would you like to try re-entering your password?"
Needless to say, re-entering won’t work. (Neither the trick with empying the keychain).

Now I’m even more confused. Sigh.


#10

Hi Jiggz

Just thinking about this… What did you try in terms of client and server timeouts and do you get any errors incrementing on the HTTP stats page?

I’m also at a loss ATM but would love to help you fix it…


#11

Just to add to why I’m fussing about timeout’s a little, I’ve worked with a lot of Exchange and HAProxy deployments and I’ve always found I need longer timeout’s than those seen in configs online.

After my last post I did a google search to see if my theory of 45-60m timeout’s are justifiable and this guy certainly thinks so:

He even goes further and tunes Exchange to be more aggressive with keep-alive which is something I have yet to need to try. However, he uses lower timeout’s than me so I’m guessing that is how he manages it.

I didn’t want to jump in straight away with such a high suggestion (Because i’m probably wrong!) but as no progress has been made and I can find a link to back me up a bit I thought I’d highlight it.


#12

I tried extending both server and client timeout values to 30min. It had no effect though…
And no, no errors on the stats page. In fact there have never been any errors visible there :frowning:

(Yep, unfortunately I had already looked at the article you provided).

The current situation is a little bit different now. As stated in my previous reply, I now get asked for credentials. This started to happen after I emptied the cache. Don’t ask me why…

Anyhow, it SEEMS that Exchage/Outlook itself is able to authenticate, as I get connected when entering my login credentials at first startup of Outlook. HOWEVER, just one second (apporox) after I’ve entered my credentials, I get presented with yet another popup dialog with the text from my above reply “Mail could not be received at this time”. The server for account xxxxx returned an error. Logon failure: unknown username or bad password. Your username/password or security settings may be incorrect. Would you like to try re-entering your password?"

Re-entering the password won’t work. My credentials aren’t simply accepted for whatever reason. While this popup dialogue is “floating” on the screen, I see that I’m connected. But as soon as I enter my credentials I get disconnected because they aren’t accepted, and the same popup comes up over and over again…