Exchange 2010 SP1 DAG & Unified Messaging Now Supports Host Based High Availability & Live Migration!

Well due to rather nice virtualization support for Lync and the fact that Denali (SQL Server vNext) does support DAG like functionality with Live Migration and host based clustering, it was about time for Exchange 2010 to catch up. And when we read the white paper  Best Practices for Virtualizing Exchange Server 2010 with Windows Server® 2008 R2 Hyper V™ that moment has finally arrived. I have to thank Michel de Rooij at  for bringing this to our attention http://eightwone.com/2011/05/14/exchange-2010-sp1-live-migration-supported/. So now we have the best features in virtualization at our disposal and that simply rocks. We read:

“Exchange server virtual machines, including Exchange Mailbox virtual machines that are part of a Database Availability Group (DAG), can be combined with host-based failover clustering and migration technology as long as the virtual machines are configured such that they will not save and restore state on disk when moved or taken offline. All failover activity must result in a cold start when the virtual machine is activated on the target node. All planned migration must either result in shut down and a cold start or an online migration that utilizes a technology such as Hyper-V live migration.”

“Microsoft Exchange Server 2010 SP1 supports virtualization of the Unified Messaging role when it is installed on the 64-bit edition of Windows Server 2008 R2. Unified Messaging must be the only Exchange role in the virtual machine. Other Exchange roles (Client Access, Edge Transport, Hub Transport, Mailbox) are not supported on the same virtual machine as Unified Messaging. The virtualized machine configuration running Unified Messaging must have at least 4 CPU cores, and at least 16 GB of memory.”

And it is NOT ONLY for Hyper-V, look at the Exchange Team blog here “The updated support guidance applies to any hardware virtualization vendor participating in the Windows Server Virtualization Validation Program (SVVP).’” Nice!

Anyone who’s at TechEd USA 2011 in Atlanta should attend EXL306 for more details. Huge requirements yes, but the same goes for physical servers. That’s how they get the performance gains needed, it’s done by lowering IO by using large amounts of RAM.

Think about the above statement, we now have support for host clustering with live migration, possibly together with technology like for example Melio (SanBolic) on the software side or Live Volume (Compellent) on the storage side to protect against SAN Failure (local or remote) and combined with DAG high availability for the databases in Exchange 2010 (which can be multi site) this becomes a very resilient package. So to come back to my other post on a brighter future for public folders, if they can sort out this red headed stepchild of the Exchange portfolio they have covered all their bases and have a great platform with the option of making it better, easier and cheaper to implement, operate & use. No one will argue with that.

I know some people will say all this is overkill, to complex, to much or to expensive. I call it having options. When the S* hits the fan and you’re “in the fight of your life” wading your way through one or multiple IT disasters to keep that mail flow up an running it is good to have multiple options. Options mean you can get the job done using creativity and tools. If you have only one tool and one option Murphy will catch up with you. Actually this is one of my most heard shout outs to the team “give me options” when problems arise. But at what cost do these options come? That is up for the business and you to decide. We’re getting very robust options in Exchange that can be leveraged with other technologies for high availability that have become more and more main stream. This means none of all this needs to be bought and implemented just for Exchange. They are already in place. Unless your IT “strategy” the last 10 years was run Windows 2000 & Exchange 2000 until the servers fall apart and we don’t have any more spares available on e-bay before we consider moving along.

Exchange 2010 Public Folder Worries At Customer: No existing ‘PublicFolderProxyInformation’ matches the following Identity

A customers was recently using the EMC GUI in their Exchange 2010 environment, having a look a the public folder properties when they got this error:

—————————
Microsoft Exchange
—————————
Can’t log on to the Exchange Mailbox server ‘DAGMBX.demolab.com’. No existing ‘PublicFolderProxyInformation’ matches the following Identity: ‘demolabHeadQuartersFincanceDepartmentFiscalUnit’. Make sure that you specified the correct ‘PublicFolderProxyInformation’ Identity and that you have the necessary permissions to view ‘PublicFolderProxyInformation’.. It was running the command ‘Get-MailPublicFolder -Identity ”demolabHeadQuartersFincanceDepartmentFiscalUnit” -Server ‘DAGMBX.demolab.com”.
—————————
OK  
—————————

image

Hey … when did this start?  They never complained about this before, but did they ever use it.This probably was actually the first time they tried to look/edit the public folder permissions after doing the following over the past month and in this particular order:

  1. Moving to Exchange 2010 SP1
  2. Removing the last Exchange 2007 servers from the organization.

Now I know about a bug that exist and that was recently blogged about by Dan Rowley in Exchange 2010 get-mailpublicfolder name returns No existing ‘PublicFolderProxyInformation’. The point is that there should be a mailbox database mounted on the server that has the System Attendant mailbox associated with it.  However, this is not the case here.  The mailbox servers are member of a DAG and all of them host a copy of the PF. The replication runs fine, users can work with them, the remaining Outlook 2003 users report no issues. But there is more in that blog: “Basically the work around is to mount a mailbox store on the server that is generating the error, or if there is a database already mounted – verify the system attendant is properly configured to point to a valid homemdb.” Now that last point is interesting and indeed that was the issue here. On two members of the DAG the homeMDB attribute was not set. Now what could be the root cause of this? I don’t know, certainly not in this case. All things have been done by the book … Ah well, luckily the fix is not very difficult. We need to put a valid entry in the homemdb. In this case we’ll take the value of the DAG member that had it filled in. This seems to be the most recently created database in the DAG. In Exchange 2010 this is done as described below. Note we have a DAG here, so we can work with any database that has a valid copy on the server(s) in question.

How to check the homeMDB attribute value:

  • Start ADSI Edit and navigate to CN=Configuration,DC=,DC=,DC=/Services/Microsoft Exchange//Administrative Groups/Exchange Administrative Group (FYDIBOHF23SPDLT)//Servers/MBXServerWithIssue
  • Right-click Microsoft System Attendant, and then click Properties to display the  Attributes list and find the homeMDB attribute.
  • If the homeMDB attribute has a value make sure  it points to a valid mailbox database. If the value of the homeMDB attribute is empty (not set) or incorrect you need to fix this.

image

How Fix the homeMDB attribute value:

  • In ADSI Edit navigate to Start ADSI Edit and navigate to CN=Configuration,DC=,DC=,DC=/Services/Microsoft Exchange//Administrative Groups/Exchange Administrative Group (FYDIBOHF23SPDLT)/Databases."
  • Right-click a mailbox database that is local (NON DAG) or has a valid copy on the server (DAG) , select Properties and in  the Attributes list, select the distinguishedName, and then click View.
  • Copy the value of the distinguishedName attribute and close the dialogs

image

NOTE in this particular case we can copy the value that was filled in the homeMDB attribute on one of the DAG members. You might not have one set in any.

  • Right-click Microsoft System Attendant, and then click Properties to get to the Attributes list, click homeMDB, and then choose Edit
  • In the Value box, paste the value that you copied form the distinguishedName attribute
  • Close the dialog boxes and exit ADSI Edit

When you’ve don this you’ll find following entry in the application event viewer:

Log Name:      Application

Source:        MSExchangeSA

Date:          11/2/2010 3:25:59 PM

Event ID:      9159

Task Category: General

Level:         Warning

Keywords:      Classic

User:          N/A

Computer:      DAGMBX.demolab.com

Description:

Microsoft Exchange System Attendant has detected that the system attendant object in the DS has been modified. System Attendant needs to restart the Microsoft Exchange Free Busy Publishing Service.

image

After that, I wait 10 minutes to get AD replicated and make sure to close the EMC and start it again and voila, it’s fixed.

Exchange 2010 DAG Issue: Cluster IP address resource ‘Cluster IP Address’ cannot be brought online

Today I was called upon to investigate an issue with an Exchange 2010 Database Availability Group that had serious backup issues with Symantec Backup Exec not working. As it turned out, while the DAG was still providing mail services and clients did not notice anything the underlying Windows Cluster Service had an issue with. The cluster resource could not be brought on line, instead we got an error:

“Cluster IP address resource ‘Cluster IP Address’ cannot be brought online because the cluster network ‘Cluster Network 1’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.”

I have been dealing with Windows 2008 (R2) clusters since the beta’s and had seen some causes of this so I started to check the cluster & Exchange DAG configuration. Nothing was wrong, not a single thing. Weird. I had seen such weird behavior once before with a Hyper-V R2 cluster. There I fixed it by disabling and enabling the NIC’s on the nodes that were having the issue, thus resetting the network. I you don’t have DRAC/ILO or KVM over IP access you can temporarily allow client access via another cluster network or you’ll need physical access to the server console.

In the event viewer I found some more errors:

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          6/18/2010 2:02:41 PM
Event ID:      1069
Task Category: Resource Control Manager
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      node1.company.com
Description: Cluster resource ‘IPv4 DHCP Address 1 (Cluster Group)’ in clustered service or application ‘Cluster Group’ failed.

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          6/18/2010 1:54:47 PM
Event ID:      1223
Task Category: IP Address Resource
Level:         Error
Keywords:     
User:          SYSTEM
Computer:     node1.company.com
Description: Cluster IP address resource ‘Cluster IP Address’ cannot be brought online because the cluster network ‘Cluster Network 1’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          6/18/2010 1:54:47 PM
Event ID:      1223
Task Caegory: IP Address Resource
Level:         Error
Keywords:     
User:          SYSTEM
Counter:      node1.company.com
Description: Cluster IP address resource ‘IPv4 DHCP Address 1 (Cluster Group)’ cannot be brought online because the cluster network ‘Cluster Network 3’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.

So these cluster networks (it’s a geographically dispersed cluster with routed subnets) are indicating they do not have “Allow clients to connect through this network” set.  Well, I checked and they did! Both “Allow cluster network communications on this network” and “allow clients to connect through this network” are enabled. 

Weird, OK but as mentioned I’ve encountered something similar before. In this case I did not want to do just disable/enable those NICs. The DAG was functioning fine and providing services tot clients, so I did not want to cause any interruption or failover now the cluster was having an issue.

So before going any further I did a search and almost within a minute I found following TechNet blog post: Cluster Core Resources fail to come online on some Exchange 2010 Database Availability Group (DAG) nodes (http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx)

Well, well, the issue is known to Microsoft and they offer three fixes. Which is actually only one, but can be done using  the Failover Cluster Manager GUI, cluster.exe or PowerShell. The fix is to simply disable and enable  “Allow clients to connect through this network” on the affected cluster network. The “long term fix” will be included in Exchange 2010 SP1. The work around does work immediately and their Backup Exec started functioning again. They’ll just have to keep an eye on this issue until the permanent fix arrives with SP1.