Al lot of people were cheering in the inter active session on Exchange 2010 SP1 High Availability with Scott Schnoll and Ross Smith of the Exchange Team. They announced (between goofing around) that the alternate server that provides failover to the clients (so they can select another public folder database to connect to) for public folders and that is sadly missing from Exchange 2010 would return with Exchange 2010 SP 1 Roll Up 2. This feature is needed by Outlook to automatically connect to an alternate public folder and it’s return means that high availability will finally be achievable for public folders in Exchange 2010 SP1. That’s great news and frankly an “oversight” that shouldn’t have happened even in Exchange 2010 RTM. The issue is described in knowledge base article “You cannot open a public folder item when the default public folder database for the mailbox database is unavailable in an Exchange Server 2010 environment” which you can find here http://support.microsoft.com/kb/2409597.
In previous versions of exchange you made public folders highly available to Outlook clients by having replica’s. The Outlook clients could access an replica on another server if the default public folders as defined in the client settings of the database was not available. Clustering in Exchange 2010 does nothing for public folders. In Exchange 2010 the Outlook clients connect directly to the mailbox server in order to get to a public folder so they do not leverage the CAS or CAS array. Also the DAG does not support public folders and as clustering happens at the database level on DAG members and no longer at the server level we no longer get any high availability for the clients with clustering in Exchange 2010. Sure, if you have multiple replica’s the data is highly available but the access to another replica/database/server for public folder doesn’t happen automatically in Outlook when you’re running Exchange 2010. To make that happen you need an alternate server to be offered to the client for selection But as this feature is missing in Exchange 2010 up until SP1 Roll Up 1 in reality until now you need to keep using Exchange 2003/2007 to have public folder high availability. Exchange 2010 SP1 Roll Up 2 will change that. I call that good news.
Really awesome rollup for public folders in HA… Any release darte for this ..?
After the installation, any other updation will feel expect this HA..
Thx for your valuable news…
I’m guessing it will before the end of 2010. But that’s my guess, nothing more.
Hi,
First of all, thank you for your post.
Did you confirm that UR2 for SP1 fixes this behavior ?
In other words, does Outlook connect to another PF replica when its mailbox database’s default public folder database is inaccessible ?
Thank You.
Hi, it works but you need to keep in mind that it is not a 100%,no interruption guaranteed, failproof system, it has never been. The client uses the OpenFlags.AlternateServer functionality to send clients to a replica when it receives an “MAPI_E_NETWORK_ERROR”. People who try to test the failover by dismounting the default Public Folder or stopping the servcices get a logon failure error so no failover occurs as sending the client to another server only happens with the network error (server is gone) or when the RPC Client Access service is down. That’s due to the fact that this is an occurnece where they are sure that routing you to an anternative copy of the public folder will work. With a logon failure that is not the case.
Thank you for your time and explanation.
This is indeed the case – the server must be inoperational (dismount is not enough as you explained).
PF reconnect happens in ~30 seconds with MAPI and 3+ minutes with Outlook Anywhere (if ever, I had to restart Outlook 🙁 ).
One additional detail, in order to control to which servers it may reconnect through referrals there is a way to control this with EMS and set-publicfolderdatabase -usecustomreferralserverlist $true. See help for it.
Thank you once again.
You’re most welcome and thank you for sharing your findings. What Outlook Anywhere is concerned. Load Balanced CAS array? Might be worth looking at the setting and perhaps teweaking those.
To my knowledge the ability to failover to an alternate Public Folder database has never existed, unless such functionality was in Exchange 2007, which I have not used. Certainly, it is most assuredly not available in Exchange 2003. The existence of replicas only works at the folder level and a client will only be referred to another Public Folder database if that folder does not exist in the Public Folder database to which it initially connected. This means that if you have the same folder replica on two servers, one with content and another without, clients that attempt to access content from the folder replica without will not be referred to the folder replica with it. That being said, it was possible to create an entire replica set of your primary Public Folder database and then in case of a failure affecting it, the mailbox stores could manually be updated to point to the other Public Folder database that held the entire replicate set.
It most certainly exists but was “lost” with E2K10 RTM until SP1 RU2. But, as stated in above comments, you need to keep in mind that it is not a 100%,no interruption guaranteed, failproof system, it has never been. The client uses the OpenFlags.AlternateServer functionality to send clients to a replica when it receives an “MAPI_E_NETWORK_ERROR”. People who try to test the failover by dismounting the default Public Folder or stopping the servcices get a logon failure error so no failover occurs as sending the client to another server only happens with the network error (server is gone) or when the RPC Client Access service is down. That’s due to the fact that this is an occurence where they are sure that routing you to an anternative copy of the public folder will work. With a logon failure that is not the case. It’s not instantanious and not completely transparant for the users, but it does exist and work. Also see http://social.technet.microsoft.com/Forums/en-US/exchange2010/thread/c9e20972-a6da-4146-9237-09bed5719238 and this The last of those updates (2409597) is a very interesting one. So far with Exchange Server 2010 SP1 there was no high availability capability for Public Folder databases. You could place your Public Folder databases on Mailbox servers that are members of a Database Availability Group, however the DAG does not support replication and failover of the Public Folder databases, only for Mailbox databases.
This meant that even though you could configure Public Folder replication to replicate your data between multiple Mailbox servers, if one server or Public Folder database went down the clients that are connecting to that Public Folder database would lose connectivity.
Update 2409597 resolves that issue by re-adding support for the MAPI calls that Outlook clients make for referral to another Public Folder if the database they are trying to connect to is offline. In short this means that true high availability for Public Folders on Exchange Server 2010 SP1 is now possible once Update Rollup 2 is applied.
Google tells more.
Do you mean to say it exists in Exchange 2003 and Exchange 2007? Or do you mean to say that it existed in Exchange 2010 prior to RTM – like in a Beta or RC and then ended up not making it into Exchange 2010 RTM?
If it exists in Exchange 2003, by all means, I’m all ears on where and how to use it. And if you are correct, then I’m going to have get my PSS hours back that I spent with Microsoft determining that there wasn’t a feature like this in Exchange 2003.
Yup… just went back and reviewed the case notes on this (for Exchange 2003).
Case 109060158229396. Very clearly from Microsoft:
With regards to Public Folder availability in the case of a Public Folder Store or Public Folder Server becoming unavailable, what might be the options?
Change the PF database to a working one is the only option.
The only easy thing that comes to mind currently would be to point a Mailbox Store to an available Public Folder Server/Store.
This is your only option.
That could very well be correct. Depending on number of PF databases & Exchange version we used several approaches (PF Replication, Clustering, CCR, …) with different behaviour/results in end users experience & actions required by admin. In previous versions of exchange meaning 2003/2007, the public folder could be highly available as clustering happened at the server level. So the mechanism is different but you did have fail over depending on the scenario. So I should be more precise and clear. High availability & failover exists in those versions but mechanism is different. But read (http://support.microsoft.com/?kbid=2409597)the following: “This issue occurs because Exchange Server 2010 does not support that the MAPI client application calls that looks for an alternative public folder when the default public folder database is unavailable. Therefore, the MAPI client application keeps trying to connect to the default public folder database, instead of trying to connect to other public folder databases in the referral list.” Well if the functionality is available in the MAPI client (= Outlook) it should also be in the Exchange servers … and then we need to know what was the first to support alternate servers. I think that is E2K3 … but that was for alternate server that accepts mail for the domain. So when was it used firstly for PF failover in a MAPI client … MS doesn’t really specify they only talk about Outlook in http://support.microsoft.com/?kbid=2409597. I pinged a connection for more info.
Here is how our scenario had played out.
We had 3 Public Folder Servers in our Exchange Organization on Exchange 2003. Each server contained a complete replica of all 30,000 Public Folders we had. We had one Public Folder Server in each of our England, Maryland, and Texas sites.
We were unable to create a clustered Public Folder database because that requires that the clustered Public Folder database is the *only* one in the entire Exchange Organization (per Microsoft). We couldn’t do this because we needed the distributed coverage.
Because of the single points of failure this created, we spent some time trying to figure out how to make happen what you describe. We were unable to do so and ended up opening the case I referenced earlier with Microsoft to figure out how to make it work. In our testing, this is what we did.
Our users’ mailboxes were on mailbox servers that were separate from our Public Folder server in our Maryland site. The mailbox stores on these servers were configured to use the Public Folder database on the Public Folder Server in our Maryland site as their default Public Folder Store. We simulated failure by completely shutting down the Public Folder Server in our Maryland site, which means our Outlook 2007 clients would have received the MAPI_E_NETWORK_ERROR error described. These clients *never* attempted to locate another Public Folder database on another server, regardless of what we did to them, and that included waiting for 30 minutes, restarting Outlook, and even restarting the computer on which Outlook was running.
Subsequently, we opened that case I mentioned and was finally told that what is described here – Outlook searching for and using an alternate Public Folder database automatically when the server hosting the primary Public Folder database was offline – was simply not something that happens nor was possible; our only solution was manual reconfiguration of the mailbox stores to point to another available Public Folder database on another Public Folder Server. Furthermore, we were told that if the primary Public Folder database was offline for whatever reason, not even referrals would work and clients would simply not be able to open Public Folders; period.
The Knowledge Base article referenced here (http://support.microsoft.com/?kbid=2409597) does seem to imply there is different logic at work, which is why I would have expected people cheering at the interactive session with Scott and Ross.
In any case, I am reaching out to Microsoft through our channels here, too, and will most definitely post whatever findings we uncover.
I spoke with a Senior Support Escalation Engineer at Microsoft on the Exchange Admin Team and this is what we found.
This functionality appears to have first been introduced in Exchange 2007. The engineer confirmed that his experience with this on Exchange 2003 is the same as mine: doesn’t work.
It was filed as a bug for Exchange 2010 RTM since it was missing and it was implemented via Exchange 2010 SP1 RU2.
That being said, I’m going to do some testing and will report on results of that testing. That may take some time, though. I’ll also spend some time looking at what it takes for Outlook Anywhere clients to reconnect; 3+ minutes kinda defeats the purpose of high availability and while I hate resorting to restarting clients, that’s certainly better than not having the ability to automatically flip to an alternate copy at all.
Here’s the feedback I got (MS Exchange peops & Exchange MVP) and which confirms this. It was introduced in 2007 and “lost” in 2010 RTM. E2K7 was confirmed last night in testing. So that behaviour will not work in E2K3. The funny thing is everyone seems pretty sure that it works in 2007, starts having doubts when asked about 2003. Our collective memory is starting to suffer. Time to emphasize in the blog that while HA does exist in E2K3, it must be done via clustering and that otherwise it is a manual process. Thx for the extensive feedback and observations!
PS: For referals to work you need the default PF indeed otherwise it doesn’t know that it holds a replica or not and can’t refer. Chicken & egg story.
Thanks for all your postings on this issue. I am currently at RU4v2 with 2 servers in DAG doing PF replicas. Can you elaborate on your PS statement? I cannot get MAPI clients to connect to the replica when I fail the primary PF holder – should I be able to?
the PS refers to the referral process: see http://technet.microsoft.com/en-us/library/bb691235.aspx & http://thoughtsofanidlemind.wordpress.com/2011/02/12/exchange-2010-public-folders-part-2/
Failover does work if all is steup to enable it but it is not “fluid”, it takes some time (sometimes anoyingly long) and to quote comment of mine to this post here “you need to keep in mind that it is not a 100%,no interruption guaranteed, failproof system, it has never been. The client uses the OpenFlags.AlternateServer functionality to send clients to a replica when it receives an “MAPI_E_NETWORK_ERROR”. People who try to test the failover by dismounting the default Public Folder or stopping the servcices get a logon failure error so no failover occurs as sending the client to another server only happens with the network error (server is gone) or when the RPC Client Access service is down. That’s due to the fact that this is an occurence where they are sure that routing you to an anternative copy of the public folder will work. With a logon failure that is not the case. It’s not instantanious and not completely transparant for the users, ” Do you have a lab where you can play a bit to verify configurations?
Yes, this is all on a test lab – the wierd thing is….OWA does get redirected to the PF replica, only MAPI (Outlook) seems to fail. I have tried waiting (1 hour) and even restarting Outlook, still get prompted for login to oringial PF holder when I try to access PF. If I manually change the Mailbox database to use the replica, that works so between that and the OWA working, I am pretty sure the replicas are healthy.
One other thing, I am disconnecting the network connections (LAN and DAG) to simulate the primary server failure. I did read somewhere that it needed to see the primary as “completely down” to fail over – not just dismounting primary PF store.
Thanks much.
Indeed, I always start with shutting down the server to make sure it works. Also see my previous reply to your comment.
So if I only have a default PF and one replica – if the default goes down there is no mechanism to fail over to the replica?
should work.
Hi @all,
thanks for this great post! There is one function I don`t understand. There will be a automatic fail over as described. We tested this in a test environment and have a result I cannot explain:
we have a CAS Array and a DAG with two nodes (node one has the active PF DB, node two the replica). The replication interval between the PF DB is automatically. The settings (which PF DB the user should use are stored in the mailbox database). In our test environment:
DB1 on node “one” (DAG1) has as a PF DB (PFDB1). All users who has their mailbox on DB1 are connected to the PFDB1. The PF DB replica is stored on the second node “two” (passiv node) also the DB copy of DB1. Some users in our test environment reported a delay of 15 minutes (incoming mail) which comes into the PF.
My understandig is, that a fail over only occurs if the active node is completely down.
Is there also an automatically load balancing between the PF DBs when I am using a CAS Array?? I cannot explain the time delay between the PF DBs. This exist since we install the newest update and installed a CAS array.
Many thanks for a feedback.
Oh boy 🙂 Mail delivery to Public folders is “black art”. This is a good place to start … http://blogs.technet.com/b/exchange/archive/2007/04/16/3401930.aspx
Hi, I am not sure if you understand me correct :-). I know how Public Folders are working. I want to ask if there is a new load balancing feature with the new update (the fail over to a second PF DB is new with the latest update). Since the latest update we have the problem as discribed. For my understanding the Client Access Array handles only the NLB to the Mailboxes (on the two DAG nodes). On which PF DB the user is redirected I have to configure on the settings of the Mailbox Database.
1) A user connect to the CAS using Outlook. The Active DB is on node 1 (DAG) and the right PF DB is also on node 1.
2.) Could it be with the new features (latest update) that if a second user is connected to the CAS using outlook, the CAS is redirect this user to the second PF DB (replica on node 2)?? I would say no, because the configuration non the active Mailboc DB say – use the PF DB on Node 1….. (that was may question :-)) …. Sorry for my bad english :-)….
Hi,
Sorry fro the confusion.
You might want to take a look at the referal rules overhere http://technet.microsoft.com/en-us/library/bb691235.aspx and especially http://technet.microsoft.com/en-us/library/bb629643.aspx about E2K10SP2
The latter one may be real interest to you.
Take care!