Today I was called upon to investigate an issue with an Exchange 2010 Database Availability Group that had serious backup issues with Symantec Backup Exec not working. As it turned out, while the DAG was still providing mail services and clients did not notice anything the underlying Windows Cluster Service had an issue with. The cluster resource could not be brought on line, instead we got an error:
“Cluster IP address resource ‘Cluster IP Address’ cannot be brought online because the cluster network ‘Cluster Network 1’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.”
I have been dealing with Windows 2008 (R2) clusters since the beta’s and had seen some causes of this so I started to check the cluster & Exchange DAG configuration. Nothing was wrong, not a single thing. Weird. I had seen such weird behavior once before with a Hyper-V R2 cluster. There I fixed it by disabling and enabling the NIC’s on the nodes that were having the issue, thus resetting the network. I you don’t have DRAC/ILO or KVM over IP access you can temporarily allow client access via another cluster network or you’ll need physical access to the server console.
In the event viewer I found some more errors:
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 6/18/2010 2:02:41 PM
Event ID: 1069
Task Category: Resource Control Manager
Level: Error
Keywords:
User: SYSTEM
Computer: node1.company.com
Description: Cluster resource ‘IPv4 DHCP Address 1 (Cluster Group)’ in clustered service or application ‘Cluster Group’ failed.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 6/18/2010 1:54:47 PM
Event ID: 1223
Task Category: IP Address Resource
Level: Error
Keywords:
User: SYSTEM
Computer: node1.company.com
Description: Cluster IP address resource ‘Cluster IP Address’ cannot be brought online because the cluster network ‘Cluster Network 1’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.
Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 6/18/2010 1:54:47 PM
Event ID: 1223
Task Caegory: IP Address Resource
Level: Error
Keywords:
User: SYSTEM
Counter: node1.company.com
Description: Cluster IP address resource ‘IPv4 DHCP Address 1 (Cluster Group)’ cannot be brought online because the cluster network ‘Cluster Network 3’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.
So these cluster networks (it’s a geographically dispersed cluster with routed subnets) are indicating they do not have “Allow clients to connect through this network” set. Well, I checked and they did! Both “Allow cluster network communications on this network” and “allow clients to connect through this network” are enabled.
Weird, OK but as mentioned I’ve encountered something similar before. In this case I did not want to do just disable/enable those NICs. The DAG was functioning fine and providing services tot clients, so I did not want to cause any interruption or failover now the cluster was having an issue.
So before going any further I did a search and almost within a minute I found following TechNet blog post: Cluster Core Resources fail to come online on some Exchange 2010 Database Availability Group (DAG) nodes (http://blogs.technet.com/b/timmcmic/archive/2010/05/12/cluster-core-resources-fail-to-come-online-on-some-exchange-2010-database-availability-group-dag-nodes.aspx)
Well, well, the issue is known to Microsoft and they offer three fixes. Which is actually only one, but can be done using the Failover Cluster Manager GUI, cluster.exe or PowerShell. The fix is to simply disable and enable “Allow clients to connect through this network” on the affected cluster network. The “long term fix” will be included in Exchange 2010 SP1. The work around does work immediately and their Backup Exec started functioning again. They’ll just have to keep an eye on this issue until the permanent fix arrives with SP1.
Good Solution……..
Good Tip! I’d like to add that if you installed 2010, setup a DAG, then upgrade to 2010 SP1 you still will have to go through the steps outlines in the MS article and disable and enable “Allow clients to connect through this network” in the cluster manager for each network.
Thx. Luckily it’s a seldon occurrence.
We have just upgraded to 2010 SP1 from RTM and find that these issues are still abound. We’ve had them before upgrading to SP1 and we still do after SP1. To fix the issue, i ultimately have to restart my active DAG member (which makes me cringe everytime I do it). Only then will the DAG continue working as intended.
I think i’ll try disaling the NICs instead. Do you disable the NIC for the LAN or for the DAG?
You’ll need to do it for the NIC that’s affected, but the trick to simply disable and enable “Allow clients to connect through this network” on the affected cluster network normally works, alternatives are disbaling the NIC of the affected network but be carefull not to lock your self out if using RDP 🙂 handy to have KVM over IP or ILO/DRAC. Rebooting is the last thing to try or do. Shouldn’t be needed.
The above steps wordked for me .
Thanks a lot for sharing the resolution.
You’re most welcome.
Still working men, thanks for the tip.We use HP DP and we need to create another IP segment just for the app, sometimes the “backupNet” just goes down and lose the segment, crazy nuts everywhere.
Still a thing in 2012 R2