Today I was called upon to investigate an issue with an Exchange 2010 Database Availability Group that had serious backup issues with Symantec Backup Exec not working. As it turned out, while the DAG was still providing mail services and clients did not notice anything the underlying Windows Cluster Service had an issue with. The cluster resource could not be brought on line, instead we got an error:
“Cluster IP address resource ‘Cluster IP Address’ cannot be brought online because the cluster network ‘Cluster Network 1’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.”
I have been dealing with Windows 2008 (R2) clusters since the beta’s and had seen some causes of this so I started to check the cluster & Exchange DAG configuration. Nothing was wrong, not a single thing. Weird. I had seen such weird behavior once before with a Hyper-V R2 cluster. There I fixed it by disabling and enabling the NIC’s on the nodes that were having the issue, thus resetting the network. I you don’t have DRAC/ILO or KVM over IP access you can temporarily allow client access via another cluster network or you’ll need physical access to the server console.
In the event viewer I found some more errors:
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 6/18/2010 2:02:41 PM
Event ID: 1069
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: node1.company.com
Description: Cluster resource ‘IPv4 DHCP Address 1 (Cluster Group)’ in clustered service or application ‘Cluster Group’ failed.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 6/18/2010 1:54:47 PM
Event ID: 1223
Task Category: IP Address Resource
Level: Error
Keywords:
User: SYSTEM
Computer: node1.company.com
Description: Cluster IP address resource ‘Cluster IP Address’ cannot be brought online because the cluster network ‘Cluster Network 1’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 6/18/2010 1:54:47 PM
Event ID: 1223
Task Caegory: IP Address Resource
Level: Error
Keywords:
User: SYSTEM
Counter: node1.company.com
Description: Cluster IP address resource ‘IPv4 DHCP Address 1 (Cluster Group)’ cannot be brought online because the cluster network ‘Cluster Network 3’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.
So these cluster networks (it’s a geographically dispersed cluster with routed subnets) are indicating they do not have “Allow clients to connect through this network” set. Well, I checked and they did! Both “Allow cluster network communications on this network” and “allow clients to connect through this network” are enabled.
Weird, OK but as mentioned I’ve encountered something similar before. In this case I did not want to do just disable/enable those NICs. The DAG was functioning fine and providing services tot clients, so I did not want to cause any interruption or failover now the cluster was having an issue.
So before going any further I did a search and almost within a minute I found following TechNet blog post: Cluster Core Resources fail to come online on some Exchange 2010 Database Availability Group (DAG) nodes (http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx)
Well, well, the issue is known to Microsoft and they offer three fixes. Which is actually only one, but can be done using the Failover Cluster Manager GUI, cluster.exe or PowerShell. The fix is to simply disable and enable “Allow clients to connect through this network” on the affected cluster network. The “long term fix” will be included in Exchange 2010 SP1. The work around does work immediately and their Backup Exec started functioning again. They’ll just have to keep an eye on this issue until the permanent fix arrives with SP1.