Correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backups

Introduction

It’s not a secret that while guest clustering with VHDSets works very well. We’ve had some struggles in regards to host level backups however. Right now I leverage Veeam Agent for Windows (VAW) to do in guest backups. The most recent versions of VAW support Windows Failover Clustering. I’d love to leverage host level backups but I was struggling to make this reliable for quite a while. As it turned out recently there are some virtual machine permission issues involved we need to fix. Both Microsoft and Veeam have published guidance on this in a KB article. We automated correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backup

The KB articles

Early August Microsoft published KB article with all the tips when thins fail Errors when backing up VMs that belong to a guest cluster in Windows. Veeam also recapitulated on the needed conditions and setting to leverage guest clustering and performing host level backups. The Veeam article is Backing up Hyper-V guest cluster based on VHD set. Read these articles carefully and make sure all you need to do has been done.

For some reason another prerequisite is not mentioned in these articles. It is however discussed in ConfigStoreRootPath cluster parameter is not defined and here https://docs.microsoft.com/en-us/powershell/module/hyper-v/set-vmhostcluster?view=win10-ps You will need to set this to make proper Hyper-V collections needed for recovery checkpoints on VHD Sets. It is a very unknown setting with very little documentation.

But the big news here is fixing a permissions related issue!

The latest addition in the list of attention points is a permission issue. These permissions are not correct by default for the guest cluster VMs shared files. This leads to the hard to pin point error.

Error Event 19100 Hyper-V-VMMS 19100 ‘BackupVM’ background disk merge failed to complete: General access denied error (0x80070005). To fix this issue, the folder that holds the VHDS files and their snapshot files must be modified to give the VMMS process additional permissions. To do this, follow these steps for correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backup.

Determine the GUIDS of all VMs that use the folder. To do this, start PowerShell as administrator, and then run the following command:

get-vm | fl name, id
Output example:
Name : BackupVM
Id : d3599536-222a-4d6e-bb10-a6019c3f2b9b

Name : BackupVM2
Id : a0af7903-94b4-4a2c-b3b3-16050d5f80f

For each VM GUID, assign the VMMS process full control by running the following command:
icacls <Folder with VHDS> /grant “NT VIRTUAL MACHINE\<VM GUID>”:(OI)F

Example:
icacls “c:\ClusterStorage\Volume1\SharedClusterDisk” /grant “NT VIRTUAL MACHINE\a0af7903-94b4-4a2c-b3b3-16050d5f80f2”:(OI)F
icacls “c:\ClusterStorage\Volume1\SharedClusterDisk” /grant “NT VIRTUAL MACHINE\d3599536-222a-4d6e-bb10-a6019c3f2b9b”:(OI)F

My little PowerShell script

As the above is tedious manual labor with a lot of copy pasting. This is time consuming and tedious at best. With larger guest clusters the probability of mistakes increases. To fix this we write a PowerShell script to handle this for us.

#Didier Van Hoye
#Twitter: @WorkingHardInIT 
#Blog: https://blog.Workinghardinit.work
#Correct shared VHD Set disk permissions for all nodes in guests cluster

$GuestCluster = "DemoGuestCluster"
$HostCluster = "LAB-CLUSTER"

$PathToGuestClusterSharedDisks = "C:\ClusterStorage\NTFS-03\GuestClustersSharedDisks"


$GuestClusterNodes = Get-ClusterNode -Cluster $GuestCluster

ForEach ($GuestClusterNode in $GuestClusterNodes)
{

#Passing the cluster name to -computername only works in W2K16 and up.
#As this is about VHDS you need to be running 2016, so no worries here.
$GuestClusterNodeGuid = (Get-VM -Name $GuestClusterNode.Name -ComputerName $HostCluster).id

Write-Host $GuestClusterNodeGuid "belongs to" $GuestClusterNode.Name

$IcalsExecute = """$PathToGuestClusterSharedDisks""" + " /grant " + """NT VIRTUAL MACHINE\"+ $GuestClusterNodeGuid + """:(OI)F"
write-Host "Executing " $IcalsExecute
CMD.EXE /C "icacls $IcalsExecute"

} 

Below is an example of the output of this script. It provides some feedback on what is happening.

Correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backup

Correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backup

PowerShell for the win. This saves you some searching and typing and potentially making some mistakes along the way. Have fun. More testing is underway to make sure things are now predictable and stable. We’ll share our findings with you.

Collect cluster nodes with HBA WWN info

Introduction

Below is a script that I use to collect cluster nodes with HBA WWN info. It grabs the cluster nodes and their HBA (virtual ports) WWN information form an existing cluster. In this example the nodes have Fibre Channel (FC) HBAs. It works equally well for iSCSI HBA or other cards. You can use the collected info in real time. As an example I also demonstrate writing and reading the info to and from a CSV.

This script comes in handy when you are replacing the storage arrays. You’ll need that info to do the FC zoning for example.  And to create the cluster en server object with the correct HBA on the new storage arrays if it allows for automation. As a Hyper-V cluster admin you can grab all that info from your cluster nodes without the need to have access to the SAN or FC fabrics. You can use it yourself and hand it over to those handling them, who can use if to cross check the info they see on the switch or the old storage arrays.

image

Script to collect cluster nodes with HBA WWN info

The script demos a single cluster but you could use it for many. It collects the cluster name, the cluster nodes and their Emulex HBAs. It writes that information to a CSV files you can read easily in an editor or Excel.

image

The scripts demonstrates reading that CSV file and parsing the info. That info can be used in PowerShell to script the creation of the cluster and server objects on your SAN and add the HBAs to the server objects. I recently used it to move a bunch of Hyper-V and File clusters to a new DELLEMC SC Series storage arrays. That has the DELL Storage PowerShell SDK. You might find it useful as an example and to to adapt for your own needs (iSCSI, brand, model of HBA etc.).

#region Supporting Functions
Function Convert-OutputForCSV {
    <#
        .SYNOPSIS
            Provides a way to expand collections in an object property prior
            to being sent to Export-Csv.

        .DESCRIPTION
            Provides a way to expand collections in an object property prior
            to being sent to Export-Csv. This helps to avoid the object type
            from being shown such as system.object[] in a spreadsheet.

        .PARAMETER InputObject
            The object that will be sent to Export-Csv

        .PARAMETER OutPropertyType
            This determines whether the property that has the collection will be
            shown in the CSV as a comma delimmited string or as a stacked string.

            Possible values:
            Stack
            Comma

            Default value is: Stack

        .NOTES
            Name: Convert-OutputForCSV
            Author: Boe Prox
            Created: 24 Jan 2014
            Version History:
                1.1 - 02 Feb 2014
                    -Removed OutputOrder parameter as it is no longer needed; inputobject order is now respected 
                    in the output object
                1.0 - 24 Jan 2014
                    -Initial Creation

        .EXAMPLE
            $Output = 'PSComputername','IPAddress','DNSServerSearchOrder'

            Get-WMIObject -Class Win32_NetworkAdapterConfiguration -Filter "IPEnabled='True'" |
            Select-Object $Output | Convert-OutputForCSV | 
            Export-Csv -NoTypeInformation -Path NIC.csv    
            
            Description
            -----------
            Using a predefined set of properties to display ($Output), data is collected from the 
            Win32_NetworkAdapterConfiguration class and then passed to the Convert-OutputForCSV
            funtion which expands any property with a collection so it can be read properly prior
            to being sent to Export-Csv. Properties that had a collection will be viewed as a stack
            in the spreadsheet.        
            
    #>
    #Requires -Version 3.0
    [cmdletbinding()]
    Param (
        [parameter(ValueFromPipeline)]
        [psobject]$InputObject,
        [parameter()]
        [ValidateSet('Stack', 'Comma')]
        [string]$OutputPropertyType = 'Stack'
    )
    Begin {
        $PSBoundParameters.GetEnumerator() | ForEach {
            Write-Verbose "$($_)"
        }
        $FirstRun = $True
    }
    Process {
        If ($FirstRun) {
            $OutputOrder = $InputObject.psobject.properties.name
            Write-Verbose "Output Order:`n $($OutputOrder -join ', ' )"
            $FirstRun = $False
            #Get properties to process
            $Properties = Get-Member -InputObject $InputObject -MemberType *Property
            #Get properties that hold a collection
            $Properties_Collection = @(($Properties | Where-Object {
                        $_.Definition -match "Collection|\[\]"
                    }).Name)
            #Get properties that do not hold a collection
            $Properties_NoCollection = @(($Properties | Where-Object {
                        $_.Definition -notmatch "Collection|\[\]"
                    }).Name)
            Write-Verbose "Properties Found that have collections:`n $(($Properties_Collection) -join ', ')"
            Write-Verbose "Properties Found that have no collections:`n $(($Properties_NoCollection) -join ', ')"
        }
 
        $InputObject | ForEach {
            $Line = $_
            $stringBuilder = New-Object Text.StringBuilder
            $Null = $stringBuilder.AppendLine("[pscustomobject] @{")

            $OutputOrder | ForEach {
                If ($OutputPropertyType -eq 'Stack') {
                    $Null = $stringBuilder.AppendLine("`"$($_)`" = `"$(($line.$($_) | Out-String).Trim())`"")
                }
                ElseIf ($OutputPropertyType -eq "Comma") {
                    $Null = $stringBuilder.AppendLine("`"$($_)`" = `"$($line.$($_) -join ', ')`"")                   
                }
            }
            $Null = $stringBuilder.AppendLine("}")
 
            Invoke-Expression $stringBuilder.ToString()
        }
    }
    End {}
}
function Get-WinOSHBAInfo {
<#
Basically add 3 nicely formated properties to the HBA info we get via WMI
These are the NodeWWW, the PortWWN and the FabricName. The raw attributes
from WMI are not readily consumable. WWNs are given with a ":" delimiter.
This can easiliy be replaced or removed depending on the need.
#>

param ($ComputerName = "localhost")
 
# Get HBA Information
$Port = Get-WmiObject -ComputerName $ComputerName -Class MSFC_FibrePortHBAAttributes -Namespace "root\WMI"
$HBAs = Get-WmiObject -ComputerName $ComputerName -Class MSFC_FCAdapterHBAAttributes  -Namespace "root\WMI"
 
$HBAProperties = $HBAs | Get-Member -MemberType Property, AliasProperty | Select -ExpandProperty name | ? {$_ -notlike "__*"}
$HBAs = $HBAs | Select-Object $HBAProperties
$HBAs | % { $_.NodeWWN = ((($_.NodeWWN) | % {"{0:x2}" -f $_}) -join ":").ToUpper() }
 
ForEach ($HBA in $HBAs) {
 
    # Get Port WWN
    $PortWWN = (($Port |? { $_.instancename -eq $HBA.instancename }).attributes).PortWWN
    $PortWWN = (($PortWWN | % {"{0:x2}" -f $_}) -join ":").ToUpper()
    Add-Member -MemberType NoteProperty -InputObject $HBA -Name PortWWN -Value $PortWWN
    # Get Fabric WWN
    $FabricWWN = (($Port |? { $_.instancename -eq $HBA.instancename }).attributes).FabricName
    $FabricWWN = (($FabricWWN | % {"{0:x2}" -f $_}) -join ":").ToUpper()
    Add-Member -MemberType NoteProperty -InputObject $HBA -Name FabricWWN -Value $FabricWWN
 
    # Output
    $HBA
}
}
#endregion 

#Grab the cluster nane in a variable. Adapt thiscode to loop through all your clusters.
$ClusterName = "DEMOLABCLUSTER"
#Grab all cluster node 
$ClusterNodes = Get-Cluster -name $ClusterName | Get-ClusterNode
#Create array of custom object to store ClusterName, the cluster nodes and the HBAs
$ServerWWNArray = @()

ForEach ($ClusterNode in $ClusterNodes) {
    #We loop through the cluster nodes the cluster and for each one we grab the HBAs that are relevant.
    #My lab nodes have different types installed up and off, so I specify the manufacturer to get the relevant ones.
    #Adapt to your needs. You ca also use modeldescription to filter out FCoE vers FC HBAs etc.
    $AllHBAPorts = Get-WinOSHBAInfo -ComputerName $ClusterNode.Name | Where-Object {$_.Manufacturer -eq "Emulex Corporation"} 

    #The SC Series SAN PowerShell takes the WWNs without any delimiters, so we dump the ":" for this use case.
    $WWNs = $AllHBAPorts.PortWWN -replace ":", ""
    $NodeName = $ClusterNode.Name

    #Build a nice node object with the info and add it to the $ServerWWNArray 
    $ServerWWNObject = New-Object psobject -Property @{
        WWN         = $WWNs
        ServerName  = $NodeName 
        ClusterName = $ClusterName         
    }
    $ServerWWNArray += $ServerWWNObject
}

#Show our array
$ServerWWNArray

#just a demo to list what's in the array
ForEach ($ServerNode in $ServerWWNArray) {    
    $Servernode.ServerName
    
    ForEach ($WWN in $Servernode.WWN)
    {$WWN}

}

#Show the results
$Export = $ServerWWNArray | Convert-OutputForCSV
#region write to CSV and read from CSV

#You can dump this in a file
$Export | export-csv -Path "c:\SysAdmin\$ClusterName.csv" -Delimiter ";"

#and get it back from a file
Get-Content -Path "c:\SysAdmin\$ClusterName.csv"
$ClusterInfoFile = Import-CSV -Path "c:\SysAdmin\$ClusterName.csv" -Delimiter ";"
$ClusterInfoFile | Format-List

#just a demo to list what's in the array
$MyClusterName = $ClusterInfoFile.clustername | get-unique
$MyClusterName
ForEach ($ClusterNode in $ClusterInfoFile) {  

    $ClusterNode.ServerName
    
    ForEach ($WWN in $ClusterNode.WWN) {
        $WWN
    }

}

Latency kills

Introduction

I was investigating a very problematic Windows Server 2016 Hyper-V cluster. That cluster was just performing horribly. “Everything” was hanging, stalling, crashing and RHS.exe errors where flying around while WER dumps got created by the dozen. Things were extremely slow up to the points functionality was just failing. The “fun” thing was that the cluster validation wizard while slow gave that cluster a big thumb up and a supported status as all was well.

Prying around

Time to pry around a bit and see if we could find something wrong. We save live migrations stall, fail, last forever in pending or get stuck at a certain percentage, sometimes finally succeeding with ridiculous blackout times. We could not open up virtual machine properties or very slowly. The FCI GUI was highly unresponsive but so was the Hyper-V Manager GUI or even PowerShell. Those were hanging at even loading the virtual machines or enumerating them with Get-VM. Everything was slow to the point it timed out or crashed. Restarting the services (Cluster, Hyper-V) didn’t do anything and restarting VMMS was super slow or just got stuck. It was a depressing sight for which people tended to blame Hyper-V / Microsoft.

As the title gives away it was latency. Not just ordinary high latency. Real bad latency. That kind of latency kills. Extreme latency produces symptoms that are similar to bugs or corrupt components of roles and features. We have a tendency to look at those first in the event logs and then we look at the network and its usual suspects (VMQ, SET, DCB). But nothing pointed to an issue that I could find.

So, storage maybe?Well we did find one Hyper-V host in the cluster with one HBA port producing too many error so we disabled that FC port for testing. No joy the Hyper-V cluster after a clean reboot of all nodes remained problematic. So on to the storage array itself.

Well holy smoke! On the two volumes for CSV in those cluster we saw latencies that were so bad I could not even believe a single VM would boot. It actually made my appreciation for Hyper-V and clustering grow as it managed to do at least a couple of things. With such latencies I would expect the services to just crash & call it a day.

clip_image002

The horrific latency on one of the CSV LUNs.

Looking at the logs we saw that the latencies occurred on the FC HBAs of the controllers. Each one above 50ms, peaking to 150-250ms and one huge peak at almost 500ms. We saw this on all four HBA’s.

clip_image004

The latency on one of the 4 FC HBA’s on one of the controllers. Not a good day. All HBAs had high latencies like this.

The issues were not at the host level (host HBA’s) or not even at the IOPS/bandwidth level of the storage itself. The latency for some reason was spiking. Further investigation lead to the conclusion that the issue was related to synchronous replication going totally wrong. Moving the replication mode to asynchronous fixed that. We’re now investigating why this happened and how to prevent this from happening again. But that’s another story.

clip_image006

Latency on one of the 4 FC HBAs on one of the controllers after we fixed the issue.

Do not assume anything

So, there you go. Everything depends on everything in some direct or indirect way. It’s all connected and that my friends, is why I’m a proponent of “service resilience engineering” where the responsible team owns the entire stack. That’s is how you can act fast.

Set the preferred site for a CSV in a site-aware stretched failover cluster

Introduction

I have presented many time over the past tears on the new and enhanced capabilities of Microsoft Failover Clustering in Windows Server 2016 (Experts Live, Cloud & Datacenter Conference Germany, MicroWarehouse’s Windows Server 2016 Launch Event etc.) Feedback has shown me that there is still a lot of need for good failover cluster design and implementation guidance.

One area of enhancement is that you now have site-aware failover clusters in Windows Server 2016. That helps optimize the, availability, behavior and performance of the workload. It leveraged cluster fault domains and in this case those fault domains are the sites where the cluster nodes reside.

clip_image002

Set the preferred site for a CSV in a site-aware stretched failover cluster

You can leverage the site awareness to do all kind of configuration optimizations. You can set a preferred site creating a primary and a DRC site. The cluster behavior will optimize for that scenario. It will also help with situation like quorum split more easily and elegantly. You can create an “Active-Active” site configuration because a cluster groups, such as virtual machines can have their own preferred site.

As you can see in the picture above there is a thing called Storage Affinity. That means that VMs follow storage and are placed in same site where their associated storage resides. As such VMs will begin live migrating to the same site as their associated CSV after 1 minute. The CSV load balancer will distribute within the preferred site. That’s cool. But when setting a preferred site at the cluster group level like for virtual machines, how does one do this for a CSV?

It’s actually quite simple. A CSV is a cluster group, just like a VM is. So, for every CSV you can set that preferred site. You just grab the cluster group a bit differently. Let’s look at an example.

For a VM you’d do this: (Get-ClusterGroup -Name DidierTest01).PreferredSite = ‘Dublin’

Now for a CSV we go about it as follows:

Get-ClusterSharedVolume “Cluster Disk 1” | Get-ClusterGroup | Fl *

clip_image004

The preferred site has not been set yet. To set the preferred site for a CSV you can do the following:

$NTFS01 = Get-ClusterSharedVolume “Cluster Disk 1” | Get-ClusterGroup $NTFS01.PreferredSite = “Dublin”
$NTFS01.PreferredSite

clip_image005

You can remove a preferred site by setting it to $Null:

$NTFS01.PreferredSite = $Null

That was not to hard was it? There is one other thing to keep in mind. Do not forget to set up your site fault domains first and set the site for your cluster nodes before you start configuring preferred sites at the cluster group level or it will throw an error. That’s the minimal setup of a site-aware cluster you need to have in place before you can do more fine-grained configurations.

New-ClusterFaultDomain –Name Dublin –Type Site –Description “Primary” –Location “Dublin DC1”
New-ClusterFaultDomain –Name Cork –Type Site –Description “Secondary” –Location “Cork DC2″
Set-ClusterFaultDomain –Name Node-A –Parent Dublin
Set-ClusterFaultDomain –Name Node-B –Parent Dublin
Set-ClusterFaultDomain –Name Node-C –Parent Cork
Set-ClusterFaultDomain –Name Node-D –Parent Cork

 

If you don’t do this and try to set preferred sites at the cluster group level you’ll get an error like:

Exception setting “PreferredSite”: “Unable to save property changes for ‘e95ad724-97d3-4848-91db-198ab8312737’.
The parameter is incorrect”
At line:1 char:1
+ $NTFS01.PreferredSite = “Dublin”
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], SetValueInvocationException
+ FullyQualifiedErrorId : ExceptionWhenSetting

 

There is a lot more to say about site-aware stretched clusters but how to deal with setting a preferred site for a CSV must be the most common question I get on this subject. Well, now it’s published to help you all out. I hope it helps.