Upgrading to DELLEMC Unisphere Central for SC Series
To prepare for rolling out SCOS 7.3 (see my blog post SC Series SCOS 7.3 for more information on this version) we upgraded our Dell Storage Manager Data Collector and DELL Storage Manager Client to 18.1.10.171. I am happy to report that went flawlessly. This means we are ready to work with CoPilot and will upgrade our SANs over the next week. That is always a phased roll out, to minimize risk.
Upgrading to 18.1.10.171 actually means we are upgrading to DELLEMC Unisphere Central for SC Series, which was announced as part of the SCOS 7.3 upgrade benefits.
The upgrade process itself is straight forward and isn’t different from what we are used to. First you upgrade the Storage Manager Data Collector and then the Storage Manager Client. If you have a remote Storage Manager Data Collector you must then upgrade that one as well.
Make sure you have successful backup and create a checkpoint before you start the upgrade. That way you also have an easy exit plan when things go south.
Upgrading the Storage Manager Data Collector & Client
This needs to be done first. It can take a while so be patient. Run the Storage Manager Data Collector 178.1.10.171.exe with elevated permissions. It unpacks and asks you to select a language.
Click OK to continue and just follow the wizard.
It will ask you to confirm you want to upgrade.
Click yes and follow the wizard.
Click “Next” to kick of the upgrade and relax.
The wizard will provide you with plenty of feedback of what it is doing along the way.
The final step after the upgrade is to start the Data Collector service.
Starting the Data Collector service can take quite a while. Be patient. When it’s done the wizard will inform you of this.
Click “Finish” to close the installer.
On your desktop you’ll notice that you now have an icon called DELL EMC Unisphere Central. This indicates that the storage management for DELL EMC offerings are converging.
Do note that if you have a remote Storage Manager Data Collector you must now upgrade that one also, Do NOT forget to keep both deployments at the same software levels.
You are now ready to upgrade the Storage Manager Client. Run the installer with elevated permissions.
Just follow the wizard, normally this goes really fast and that’s it. You can log into the new and see that the GUI is very familiar to anyone using already.
What is new is the look and feel of the DELLEMC Unisphere Central for SC Series. It’s not you father’s data collector any more.
We’ll talk about DELLEMC Unisphere Central for SC Series later when we have had a chance to work with it some more in real live.
Nowadays everyone seems to be heading to the hills to try and cash in on the new gold rush. Data! You have all heard that data is the new gold. Some call it the new oil, as that is their favorite fantasy, that’s all good. But there are drawbacks to this.
The cost of data and data waste
Like any resource you mine it does come with associated costs. A cost that has to be covered by the value you derive from it. That value has to exceed the money spend to gather, process and consume it. That can be an expensive business.
On top of that you have to deal with “the waste” it creates as a byproduct. Waste can be toxic. Mining data tends to produce nuclear data waste, the bad kind where “safe levels” are hard to determine. In the rush to grab the gold many forget a couple of important lessons from history. We should know by now we need to act proactively to avoid waste. That is the cheapest option in the long run and mitigates many of the risks. We should also know that not all data is gold, some of it just glitters, but it isn’t valuable. Fool’s data, like fool’s gold, is essentially worthless no matter how much money you have spent. Even worse, it’s still produces nuclear data waste asset than can get you into (legal) trouble.
Data storages and backups
How much data to you need to get to the gold and at what cost. Storage capabilities as well as storage capacity grows fast and cost seems under control for now. But will this last forever? And even if so, what’s the ratio of data gold versus raw data stored? Can we improve this ratio? Because even when things are cheap, why even do it if it is not needed?
Protecting the data and the waste
And then there is the cost with protecting that data as well as the governance around it. The sad reality with data is that once you have it the probability that it will get you into trouble is real. Data, sooner or latter will get lost, misplaced, sold, hacked, leaked, … it’s almost guaranteed. Ask any real InfoSec professional (not the standard issue, policy quoting security officers, those are just windows dressing) and they will open your eyes to the reality of the risks. It’s very sobering.
The gold rush
As with any hype or gold rush we can avoid costly mistakes buy looking at history. Think about who benefited and who lost out. Think about why this happened and how. Can you see any parallels?
Many people are drawn to the data gold fields. Very few strike a gold vein.
There is a lot of money to be made selling the tools, supplies and gear to mine the data, process it, store and protect it.
Gathering raw data and processing it can be highly toxic
Storing and protecting the gold is expensive and hard.
Let’s dive a bit deeper into these issues, what these mean and how they materialize. They all have one thing in common for sure and that is that the fear of missing out is one of the driving factors.
Gold diggers
The reality is that many people that now become “data scientists” are not all highly skilled mathematicians and experts at statistics analysis. It’s a new hype, just like OLAP tools and data mining where before. We now have BI, big data and data science. That’s where the gold can be found so that’s where gold diggers flock to. Some have the skills, abilities and the luck to derive wealth from that. Most will just have the job digging.
There is a lot more data than there is science in the hype created around data scientists. Data scientists should be great at math and statistics. Those are not very fields of human endeavor that do not scale well. They are not even popular. Attaching “scientist” to something doesn’t make it a science. Be sure of the quality of your gold and make sure it is not fool’s gold. But the gold rush is on. There’s money to be made. In an era where science is viewed by many as “an opinion” the urge to derive some credibility from adding “science” to any endeavor is a paradox. It is on the rise. Clearly, this shows the value of real science even when some only seem to like it when it suits their agenda.
But the field is exploding as companies want people working on all the raw data they collect. As one statistician stated: “I used to be a boring, underpaid geek with glasses, now that I’m a data scientist I’m cool, in demand and paid very well even if the work is less scientific.” That’s the nature of the beast. Her employer got a real statistician, but as the mines require a lot more bodies, many will make due with less.
Merchants
The “sure” money is in the supply chain. Storage, networking, compute … no matter where it is (cloud, fog or on-premises computing). There is money to be made with tools to process the data, protect it (backups) and secure it against unwanted prying eyes and theft. If you’re selling any of those business is a booming.
Everyone seems obsessed with collecting data. Luckily storage costs are down per GB and we can store ever more. We also need to protect more. But whoever deletes data? Who dares push the button? A lot of data is collected “just in case”. We might find gold in there later and if we don’t have it we cannot for sure. The fear of missing out in action. That is great if you’re selling stuff. Data lakes, data ponds, storage blobs or tables, Mongo DB or SQL PaaS, storage arrays, data processing technology and data protection. These can be products or services, it doesn’t matter, there is money to be made. And while you’re selling you’re not asking the buyers if the really need it. You don’t question them, you praise their insights and help them protect their investment. Everyone is doing it, so must you. The copy/paste strategy in action.
Nuclear data waste
While the vast growth in data is spectacular. A lot of it is crap. But there is very little effort put into being selective. It’s too cheap right now to collect and store it. No one want to say “we don’t need it” and be the one to blame if you don’t have it.
But in the age of data leaks, hackers, privacy concerns and ever more legislation around data protection it’s worth making sure you don’t store data just because you can. Storing data holds inherent risks. Risk of losing it, corrupting it, deriving faulty information from it, leaking it, have it stolen or abused. It
In the age of GDPR and many other rightful privacy and data protection concerns collecting data should be treated like nuclear power. The value it brings is undeniable. But you don’t need vast amounts of nuclear fuel to deliver that value. You do need very good processes, fail safes, regulation, capable people and technology.
We should start looking at data as nuclear fuel and as such, after use and processing, part of it is left as toxic nuclear data waste. It’s a long-term toxic by product of the process of deriving information form data. Minimize the collection, storage and of data to achieve your goals at minimum cost and risk. Luckily, we have a very good solution for toxic data waste. You can delete it and wipe it securely.
We have to stop thinking that more is better when much of it is junk. The overhead of caring for that junk is ridiculous. We might have to do so for nuclear waste out of need. But for data there are alternatives. Destroy it if you don’t need it. It’s the safest way to handle the legal and reputation risks related to it. That will take a conscious effort.
Efforts and costs
Critical thinking about collecting data is lacking. That is understandable. There is a lot of the money to be made in data mining is in providing the tools to collect, process, store and protect the data. Even with many people that warn us of the security issues and legal responsibilities around it is often about selling services and products. For many all this might turn out to be a lot like the other gold rushes. There were a lot more suppliers of the tools that got rich than actual finders of profitable gold mines. This means there is also a lot of pressure and incentive to feed the “data is the new gold” beast.
Where a SQL database or a data warehouse at least meant you had to put effort into collecting the data, the rise of unstructured data technologies means way too often we don’t care and we’ll figure it out later. Imagine doing that to nuclear fuel! For now, the technical advances in storage and data technologies has allowed us to act without too much deliberation on the sanity of our choices. That might change, it might be wise to avoid the cold shower when it does and benefit from minimizing toxic data risks today.
Conclusion
Now true data gold is very valuable, but make sure you can recognize it. Just going through the motions and buying the tools, copying “in the know” statements from the internet isn’t going to cut it. That is called pretending. Sure, it’s fun. It is also a very dangerous and costly mistake when things get real. At best you look like an idiot with money. Many sales people will separate you from your money very efficiently.
The smarter organizations already have a data strategy that includes waste avoidance, reduction and management. Many don’t unfortunately. Collecting data for those is the main goal, driven by the tyranny of action over strategies. You have to be seen acting and being in charge. The buzz words have to be present and you have to come across as a “can do sir, yes sir” person. Well that is what will kill you. The late Norman Schwarzkopf knew this all too well.
Take care of your weaknesses, figure them out before they hurt you and before they destroy your ability to exploit your strengths. That people is a strategy exercise. I can do that for you and it will cost you a lot of money. But remember, strategies are not products you can buy, they are not commodities and as such buying them is a paradox. A strategy is what will give you the edge over your competitors.If you have others determine your strategy, your competitors will pay them more to find out . So, roll up your sleeves and put in the effort yourself. In the end, it’s all about common sense and this is true for data-mining, AI and BI as well.
Altaro Free Webcast: What’s New in Windows Server 2019
Altaro software is running a free webinar: what’s New in Windows Server 2019. The timing, October the 3rd, could not be better as Microsoft Ignite lies right behind us and we can all use some help in putting that barrage of announcements in to context. That help is here and offered by industry experts like Andy Syrewicze, Rob Corradini and Symon Perriman.
So register here an get some insights, views and guidance from industry experts on the value of Windows Server 2019 and what this means to so many of us in the IT field. As our industry is changing and new balances are found, let me say that Windows Server 2019 will play a significant role in the next years building the future of IT.
My take
The industries I advise know I advocate Serverless, containers and servers as well as PAAS & SAAS. My strength lies in the fact I know the IT stack. Compute, storage, memory and networks. I started my career developing code and as such I know code needs an excellent environment to run it so it can shine. I also know I don’t know a lot. So I’m active in the community, attend conferences, listen and learn from other peoples and vendors their point of view and insights. You must avoid tunnel vision and echo chambers. But you also must grasp your own industry and business in order to make decisions and move ahead.
Observe, orient, decide and act in a never ending cycle. So register or Altaro’s free webinar: what’s New in Windows Server 2019 and get a head start in this process. I have registered and intent to attend unless work priorities prevent me from doing so. I most certainly hope not! See you there. Register here. It.s free, all you got to do is show up and invest some time in your own future.
I was tasked to troubleshoot a cluster where cluster aware updating (CAU) failed due to the nodes never succeeding going into maintenance mode. It seemed that none of the obvious or well know issues and mistakes that might break live migrations were present. Looking at the cluster and testing live migration not a single VM on any node would live migrate to any other node. So, I take a peek the event id and description and it hits me. I have seen this particular event id before.
Log Name: System Source: Microsoft-Windows-Hyper-V-High-Availability Date: 9/27/2018 15:36:44 Event ID: 21502 Task Category: None Level: Error Keywords: User: SYSTEM Computer: NODE-B.datawisetech.corp Description: Live migration of ‘Virtual Machine ADFS1’ failed. Virtual machine migration operation for ‘ADFS1’ failed at migration source ‘NODE-B’. (Virtual machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7) Failed to verify collection registry for virtual machine ‘ADFS1’: The system cannot find the file specified. (0x80070002). (Virtual Machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7).The live migration fails due to non-existent SharedStoragePath or ConfigStoreRootPath which is where collections metadata lives.
More errors are logged
There usually are more related tell-tale events. They however are clear in pin pointing the root cause.
On the destination host
On the destination host you’ll find event id 21066:
Log Name: Microsoft-Windows-Hyper-V-VMMS-Admin Source: Microsoft-Windows-Hyper-V-VMMS Date: 9/27/2018 15:36:45 Event ID: 21066 Task Category: None Level: Error Keywords: User: SYSTEM Computer: NODE-A.datawisetech.corp Description: Failed to verify collection registry for virtual machine ‘ADFS1’: The system cannot find the file specified. (0x80070002). (Virtual Machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7).
A bunch of 1106 events per failed live migration per VM in like below:
Log Name: Microsoft-Windows-Hyper-V-VMMS-Operational Source: Microsoft-Windows-Hyper-V-VMMS Date: 9/27/2018 15:36:45 Event ID: 1106 Task Category: None Level: Error Keywords: User: SYSTEM Computer: NODE-A.datawisetech.corp Description: vm\service\migration\vmmsvmmigrationdestinationtask.cpp(5617)\vmms.exe!00007FF77D2171A4: (caller: 00007FF77D214A5D) Exception(998) tid(1fa0) 80070002 The system cannot find the file specified.
As well as event id 21111: Log Name: Microsoft-Windows-Hyper-V-High-Availability-Admin Source: Microsoft-Windows-Hyper-V-High-Availability Date: 9/27/2018 15:36:44 Event ID: 21111 Task Category: None Level: Error Keywords: User: SYSTEM Computer: NODE-B.datawisetech.corp Description: Live migration of ‘Virtual Machine ADFS1’ failed.
… event id 21066: Log Name: Microsoft-Windows-Hyper-V-VMMS-Admin Source: Microsoft-Windows-Hyper-V-VMMS Date: 9/27/2018 15:36:44 Event ID: 21066 Task Category: None Level: Error Keywords: User: SYSTEM Computer: NODE-B.datawisetech.corp Description: Failed to verify collection registry for virtual machine ‘ADFS1’: The system cannot find the file specified. (0x80070002). (Virtual Machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7).
… and event id 21024: Log Name: Microsoft-Windows-Hyper-V-VMMS-Admin Source: Microsoft-Windows-Hyper-V-VMMS Date: 9/27/2018 15:36:44 Event ID: 21024 Task Category: None Level: Error Keywords: User: SYSTEM Computer: NODE-B.datawisetech.corp Description: Virtual machine migration operation for ‘ADFS1’ failed at migration source ‘NODE-B’. (Virtual machine ID 4B5F2F6C-AEA3-4C7B-8342-E255D1D112D7)
Live migration fails due to non-existent SharedStoragePath or ConfigStoreRootPath explained
This is what a Windows Server 2016/2019 cluster that has not been configured with a looks like.
Get-VMHostCluster -ClusterName “W2K19-LAB”
HKLM\Cluster\Resources\GUIDofWMIResource\Parameters there is a value called ConfigStoreRootPath which in PowerShell is know as the SharedStoragePath property. You can also query it via
And this is what it looks like in the registry (0.Cluster and Cluster keys.) The resource ID we are looking at is the one of the Virtual Machine Cluster WMI resource.
If it returns a path you must verify that it exists, if not you’re in trouble with live migrations. You will also be in trouble with host level guest cluster backups or Hyper-V replicas of them. Maybe you don’t have guest cluster or use in guest backups and this is just a remnant of trying them out.
When I run it on the problematic cluster I get a path points to a folder on a CSV that doesn’t exist.
Did they rename the CSV? Replace the storage array? Well as it turned out they reorganized and resized the CSVs. As they can’t shrink SAN LUNs the created new ones. They then leveraged storage live migration to move the VMs.
The old CSV’s where left in place for about 6 weeks before they were cleaned up. As this was the first time they ran Cluster Aware Updating after removing them this is the first time they hit this problem. Bingo! You probably think you’ll just change it to an existing CSV folder path or delete it. Well as it turns out, you cannot do that. You can try …
Set-VMHostCluster : The operation on computer ‘W2K19-LAB’ failed: The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: HRESULT 0x80070032 At line:1 char:1 + Set-VMHostCluster -ClusterName “W2K19-LAB” -SharedStoragePath “C:\Clu … + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [Set-VMHostCluster], VirtualizationException + FullyQualifiedErrorId : OperationFailed,Microsoft.HyperV.PowerShell.Commands.SetVMHostCluster
Whatever you try, deleting, overwriting, … no joy. As it turns out you cannot change it and this is by design. A shaky design I would say. I understand the reasons because if it changes or is deleted and you have guest clusters with collection depending on what’s in there you have backup and live migration issues with the guest clusters. But if you can’t change it you also run into issues if storage changes. You dammed if you do, dammed if you don’t.
Workaround 1
What
Create a CSV with the old name and folder(s) to which the current path is pointing. That works. It could even be a very small one. As test I use done of 1GB. Not sure of that’s enough over time but if you can easily extend your CSV that’s should not pose a problem. It might actually be a good idea to have this as a best practice. Have a dedicated CSV for the SharedStoragePath. I’ll need to ask Microsoft.
How
You know how to create a CSV and a folder I guess, that’s about it. I’ll leave it at that.
Workaround 2
What
Set the path to a new one in the registry. This could be a new path (mind you this won’t fix any problems you might already have now with existing guest clusters).
Delete the value for that current path and leave it empty. This one is only a good idea if you don’t have a need for VHD Set Guest clusters anymore. Basically, this is resetting it to the default value.
How
There are 2 ways to do this. Both cost down time. You need to bring the cluster service down on all nodes and then you don’t have your CSV’s. That means your VMs must be shut down on all nodes of the cluster
The Microsoft Support way
Well that’s what they make you do (which doesn’t mean you should just do it even without them instructing you to do so)
Export your HKLM\Cluster\Resources\GUIDofWMIResource\Parameters for save keeping and restore if needed.
Shut down all VMs in the cluster or even the ones residing on a CSV even if not clusterd.
Stop the cluster service on all nodes (the cluster is shutdown if you do that), leave the one you are working on for last.
From one node, open up the registry key
Click on HKEY_LOCAL_MACHINE and then click on file, then select load hive
Browse to c:\windows\cluster, and select CLUSDB
Click ok, and then name it DB
Expand DB, then expand Resources
Select the GUID of Virtual Machine WMI
Click on parameters, on (configStoreRootPath) you will find the value
Double click on it, and delete it or set it to a new path on a CSV that you created already
Start the cluster service
Then start the cluster service from all nodes, node by node
My way
Not supported, at your own risk, big boy rules apply. I have tried and tested this a dozen times in the lab on multiple clusters and this also works.
In the registry key Cluster (HKLM\Cluster\Resources\GUIDofWMIResource\Parameters) of ever cluster node delete the content of the REGZ value for configStoreRootPath, so it is empty or change it to a new path on a CSV that you created already for this purpose.
If you have a cluster with a disk witness, the node who owns the disk witness also has a 0.Cluster key (HKLM\0.Cluster\Resources\GUIDofWMIResource\Parameters). Make sure you also to change the value there.
When you have done this. You have to shut down all the virtual machines. You then stop the cluster service on every node. I try to work on the node owning the disk witness and shut down the cluster on that one as the final step. This is also the one where I start again the cluster again first so I can easily check that the value remains empty in both the Cluster and the 0.Cluster keys. Do note that with a file share / cloud share witness, knowing what node was shut down last can be important. See https://blog.workinghardinit.work/2017/12/11/cluster-shared-volumes-without-active-directory/. That’s why I always remember what node I’m working on and shut down last.
Start up the cluster service on the other nodes one by one.
This avoids having to load the registry hive but editing the registry on every node in large clusters is tedious. Sure, this can be scripted in combination with shutting down the VMs, stopping the cluster service on all nodes, changing the value and then starting the cluster services again as well as the VMs. You can control the order in which you go through the nodes in a script as well. I actually did script this but I used my method. you can find it at the bottom of this blog post.
Both methods will work and live migrations will work again. Any existing problematic guest cluster VMs in backup or live migration is food for another blog post perhaps. But you’ll have things like driving your crazy.
Some considerations
Workaround 1 is a bit of a “you got to be kidding me” solution but at least it leaves some freedom replace, rename, reorganize the other CSVs as you see fit. So perhaps having a dedicated CSV just for this purpose is not that silly. Another benefit is that this does not involve messing around in the cluster database via the registry. This is something we advise against all the time but now has become a way to get out of a pickle.
Workaround 2 speaks for its self. There is two ways to achieve this which I have shown. But a word of warning. The moment the path changes and you have some already existing VHD Set guests clusters that somehow depend on that you’ll see that backups start having issues and possibly even live migrations. But you’re toast for all your Live migrations anyway already so … well yeah, what can I do.
So, this is by design. Maybe it is but it isn’t very realistic that your stuck to a path and name that hard and that it causes this much grief or allows for people to shoot themselves in the foot. It’s not like all this documented somewhere.
Conclusion
This needs to be fixed. While I can get you out of this pickle it is a tedious operation with some risk in a production environment. It also requires down time, which is bad. On top of that it will only have a satisfying result if you don’t have any VHD Set guest clusters that rely on the old path. The mechanism behind the SharedStoragePath isn’t as robust and flexible yet as it should be when it comes to changes & dealing with failed host level guest cluster backups.
I have tested this in Windows 2019 insider preview. The issue is still there. No progress on that front. Maybe in some of the future cumulative updates, things will be fixed to make guest clustering with VHD Set a more robust and reliable solution. The fact that Microsoft relies on guest clustering to support some deployment scenarios with S2D makes this even more disappointing. It is also a reason I still run physical shared storage-based file clusters.
The problematic host level backups I can work around by leveraging in guest backups. But the path issue is unavoidable if changes are needed.
After 2 years of trouble with the framework around guest cluster backups / VHD Set, it’s time this “just works”. No one will use it when it remains this troublesome and you won’t fix this if no one uses this. The perfect catch 22 situation.
The Script
$ClusterName = "W2K19-LAB"
$OwnerNodeWitnessDisk = $Null
$RemberLastNodeThatWasShutdown = $Null
$LogFileName = "ConfigStoreRootPathChange"
$RegistryPathCluster = "HKLM:\Cluster\Resources\$WMIClusterResourceID\Parameters"
$RegistryPathClusterDotZero = "HKLM:\0.Cluster\Resources\$WMIClusterResourceID\Parameters"
$REGZValueName = "ConfigStoreRootPath"
$REGZValue = $Null #We need to empty the value
#$REGZValue = "C:\ClusterStorage\ReFS-01\SharedPath" #We need to set a new path.
#Region SupportingFunctionsAndWorkFlows
Workflow ShutDownVMs {
param ($AllVMs)
Foreach -parallel ($VM in $AllVMs) {
InlineScript {
try {
If ($using:VM.State -eq "Running") {
Stop-VM -Name $using:VM.Name -ComputerName $using:VM.ComputerName -force
}
}
catch {
$ErrorMessage = $_.Exception.Message
$ErrorLine = $_.InvocationInfo.Line
$ExceptionInner = $_.Exception.InnerException
Write-2-Log -Message "!Error occured!:" -Severity Error
Write-2-Log -Message $ErrorMessage -Severity Error
Write-2-Log -Message $ExceptionInner -Severity Error
Write-2-Log -Message $ErrorLine -Severity Error
Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
}
}
}
}
#Code to shut down all VMs on all Hyper-V cluster nodes
Workflow StartVMs {
param ($AllVMs)
Foreach -parallel ($VM in $AllVMs) {
InlineScript {
try {
if ($using:VM.State -eq "Off") {
Start-VM -Name $using:VM.Name -ComputerName $using:VM.ComputerName
}
}
catch {
$ErrorMessage = $_.Exception.Message
$ErrorLine = $_.InvocationInfo.Line
$ExceptionInner = $_.Exception.InnerException
Write-2-Log -Message "!Error occured!:" -Severity Error
Write-2-Log -Message $ErrorMessage -Severity Error
Write-2-Log -Message $ExceptionInner -Severity Error
Write-2-Log -Message $ErrorLine -Severity Error
Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
}
}
}
}
function Write-2-Log {
[CmdletBinding()]
param(
[Parameter()]
[ValidateNotNullOrEmpty()]
[string]$Message,
[Parameter()]
[ValidateNotNullOrEmpty()]
[ValidateSet('Information', 'Warning', 'Error')]
[string]$Severity = 'Information'
)
$Date = get-date -format "yyyyMMdd"
[pscustomobject]@{
Time = (Get-Date -f g)
Message = $Message
Severity = $Severity
} | Export-Csv -Path "$PSScriptRoot\$LogFileName$Date.log" -Append -NoTypeInformation
}
#endregion
Try {
Write-2-Log -Message "Connecting to cluster $ClusterName" -Severity Information
$MyCluster = Get-Cluster -Name $ClusterName
$WMIClusterResource = Get-ClusterResource "Virtual Machine Cluster WMI" -Cluster $MyCluster
Write-2-Log -Message "Grabbing Cluster Resource: Virtual Machine Cluster WMI" -Severity Information
$WMIClusterResourceID = $WMIClusterResource.Id
Write-2-Log -Message "The Cluster Resource Virtual Machine Cluster WMI ID is $WMIClusterResourceID " -Severity Information
Write-2-Log -Message "Checking for quorum config (disk, file share / cloud witness) on $ClusterName" -Severity Information
If ((Get-ClusterQuorum -Cluster $MyCluster).QuorumResource -eq "Witness") {
Write-2-Log -Message "Disk witness in use. Lookin up for owner node of witness disk as that holds the 0.Cluster registry key" -Severity Information
#Store the current owner node of the witness disk.
$OwnerNodeWitnessDisk = (Get-ClusterGroup -Name "Cluster Group").OwnerNode
Write-2-Log -Message "Owner node of witness disk is $OwnerNodeWitnessDisk" -Severity Information
}
}
Catch {
$ErrorMessage = $_.Exception.Message
$ErrorLine = $_.InvocationInfo.Line
$ExceptionInner = $_.Exception.InnerException
Write-2-Log -Message "!Error occured!:" -Severity Error
Write-2-Log -Message $ErrorMessage -Severity Error
Write-2-Log -Message $ExceptionInner -Severity Error
Write-2-Log -Message $ErrorLine -Severity Error
Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
Break
}
try {
$ClusterNodes = $MyCluster | Get-ClusterNode
Write-2-Log -Message "We have grabbed the cluster nodes $ClusterNodes from $MyCluster" -Severity Information
Foreach ($ClusterNode in $ClusterNodes) {
#If we have a disk witness we also need to change the in te 0.Cluster registry key on the current witness disk owner node.
If ($ClusterNode.Name -eq $OwnerNodeWitnessDisk) {
if (Test-Path -Path $RegistryPathClusterDotZero) {
Write-2-Log -Message "Changing $REGZValueName in 0.Cluster key on $OwnerNodeWitnessDisk who owns the witnessdisk to $REGZvalue" -Severity Information
Invoke-command -computername $ClusterNode.Name -ArgumentList $RegistryPathClusterDotZero, $REGZValueName, $REGZValue {
param($RegistryPathClusterDotZero, $REGZValueName, $REGZValue)
Set-ItemProperty -Path $RegistryPathClusterDotZero -Name $REGZValueName -Value $REGZValue -Force | Out-Null}
}
}
if (Test-Path -Path $RegistryPathCluster) {
Write-2-Log -Message "Changing $REGZValueName in Cluster key on $ClusterNode.Name to $REGZvalue" -Severity Information
Invoke-command -computername $ClusterNode.Name -ArgumentList $RegistryPathCluster, $REGZValueName, $REGZValue {
param($RegistryPathCluster, $REGZValueName, $REGZValue)
Set-ItemProperty -Path $RegistryPathCluster -Name $REGZValueName -Value $REGZValue -Force | Out-Null}
}
}
Write-2-Log -Message "Grabbing all VMs on all clusternodes to shut down" -Severity Information
$AllVMs = Get-VM –ComputerName ($ClusterNodes)
Write-2-Log -Message "We are shutting down all running VMs" -Severity Information
ShutdownVMs $AllVMs
}
catch {
$ErrorMessage = $_.Exception.Message
$ErrorLine = $_.InvocationInfo.Line
$ExceptionInner = $_.Exception.InnerException
Write-2-Log -Message "!Error occured!:" -Severity Error
Write-2-Log -Message $ErrorMessage -Severity Error
Write-2-Log -Message $ExceptionInner -Severity Error
Write-2-Log -Message $ErrorLine -Severity Error
Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
Break
}
try {
#Code to stop the cluster service on all cluster nodes
#ending with the witness owner if there is one
Write-2-Log -Message "Shutting down cluster service on all nodes in $MyCluster that are not the owner of the witness disk" -Severity Information
Foreach ($ClusterNode in $ClusterNodes) {
#First we shut down all nodes that do NOT own the witness disk
If ($ClusterNode.Name -ne $OwnerNodeWitnessDisk) {
Write-2-Log -Message "Stop cluster service on node $ClusterNode.Name" -Severity Information
if ((Get-ClusterNode -Cluster W2K19-LAB | where-object {$_.State -eq "Up"}).count -ne 1) {
Stop-ClusterNode -Name $ClusterNode.Name -Cluster $MyCluster | Out-Null
}
Else {
Stop-Cluster -Cluster $MyCluster -Force | Out-Null
$RemberLastNodeThatWasShutdown = $ClusterNode.Name
}
}
}
#Whe then shut down the nodes that owns the witness disk
#If we have a fileshare etc, this won't do anything.
Foreach ($ClusterNode in $ClusterNodes) {
If ($ClusterNode.Name -eq $OwnerNodeWitnessDisk) {
Write-2-Log -Message "Stopping cluster and as such last node $ClusterNode.Name" -Severity Information
Stop-Cluster -Cluster $MyCluster -Force | Out-Null
$RemberLastNodeThatWasShutdown = $OwnerNodeWitnessDisk
}
}
#Code to start the cluster service on all cluster nodes,
#starting with the original owner of the witness disk
#or the one that was shut down last
Foreach ($ClusterNode in $ClusterNodes) {
#First we start the node that was shut down last. This is either the one that owned the witness disk
#or just the last node that was shut down in case of a fileshare
If ($ClusterNode.Name -eq $RemberLastNodeThatWasShutdown) {
Write-2-Log -Message "Starting the clusternode $ClusterNode.Name that was the last to shut down" -Severity Information
Start-ClusterNode -Name $ClusterNode.Name -Cluster $MyCluster | Out-Null
}
}
Write-2-Log -Message "Starting the all other clusternodes in $MyCluster" -Severity Information
Foreach ($ClusterNode in $ClusterNodes) {
#We then start all the other nodes in the cluster.
If ($ClusterNode.Name -ne $RemberLastNodeThatWasShutdown) {
Write-2-Log -Message "Starting the clusternode $ClusterNode.Name" -Severity Information
Start-ClusterNode -Name $ClusterNode.Name -Cluster $MyCluster | Out-Null
}
}
}
catch {
$ErrorMessage = $_.Exception.Message
$ErrorLine = $_.InvocationInfo.Line
$ExceptionInner = $_.Exception.InnerException
Write-2-Log -Message "!Error occured!:" -Severity Error
Write-2-Log -Message $ErrorMessage -Severity Error
Write-2-Log -Message $ExceptionInner -Severity Error
Write-2-Log -Message $ErrorLine -Severity Error
Write-2-Log -Message "Bailing out - Script execution stopped" -Severity Error
Break
}
Start-sleep -Seconds 15
Write-2-Log -Message "Grabbing all VMs on all clusternodes to start them up" -Severity Information
$AllVMs = Get-VM –ComputerName ($ClusterNodes)
Write-2-Log -Message "We are starting all stopped VMs" -Severity Information
StartVMs $AllVMs
#Hit it again ...
$AllVMs = Get-VM –ComputerName ($ClusterNodes)
StartVMs $AllVMs
The script is below as promised. If you use this without testing in a production environment and it blows up in your face you are going to get fired and it is your fault. You can use it both to introduce as fix the issue. The action are logged in the directory where the script is run from.