Correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backups

Introduction

It’s not a secret that while guest clustering with VHDSets works very well. We’ve had some struggles in regards to host level backups however. Right now I leverage Veeam Agent for Windows (VAW) to do in guest backups. The most recent versions of VAW support Windows Failover Clustering. I’d love to leverage host level backups but I was struggling to make this reliable for quite a while. As it turned out recently there are some virtual machine permission issues involved we need to fix. Both Microsoft and Veeam have published guidance on this in a KB article. We automated correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backup

The KB articles

Early August Microsoft published KB article with all the tips when thins fail Errors when backing up VMs that belong to a guest cluster in Windows. Veeam also recapitulated on the needed conditions and setting to leverage guest clustering and performing host level backups. The Veeam article is Backing up Hyper-V guest cluster based on VHD set. Read these articles carefully and make sure all you need to do has been done.

For some reason another prerequisite is not mentioned in these articles. It is however discussed in ConfigStoreRootPath cluster parameter is not defined and here https://docs.microsoft.com/en-us/powershell/module/hyper-v/set-vmhostcluster?view=win10-ps You will need to set this to make proper Hyper-V collections needed for recovery checkpoints on VHD Sets. It is a very unknown setting with very little documentation.

But the big news here is fixing a permissions related issue!

The latest addition in the list of attention points is a permission issue. These permissions are not correct by default for the guest cluster VMs shared files. This leads to the hard to pin point error.

Error Event 19100 Hyper-V-VMMS 19100 ‘BackupVM’ background disk merge failed to complete: General access denied error (0x80070005). To fix this issue, the folder that holds the VHDS files and their snapshot files must be modified to give the VMMS process additional permissions. To do this, follow these steps for correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backup.

Determine the GUIDS of all VMs that use the folder. To do this, start PowerShell as administrator, and then run the following command:

get-vm | fl name, id
Output example:
Name : BackupVM
Id : d3599536-222a-4d6e-bb10-a6019c3f2b9b

Name : BackupVM2
Id : a0af7903-94b4-4a2c-b3b3-16050d5f80f

For each VM GUID, assign the VMMS process full control by running the following command:
icacls <Folder with VHDS> /grant “NT VIRTUAL MACHINE\<VM GUID>”:(OI)F

Example:
icacls “c:\ClusterStorage\Volume1\SharedClusterDisk” /grant “NT VIRTUAL MACHINE\a0af7903-94b4-4a2c-b3b3-16050d5f80f2”:(OI)F
icacls “c:\ClusterStorage\Volume1\SharedClusterDisk” /grant “NT VIRTUAL MACHINE\d3599536-222a-4d6e-bb10-a6019c3f2b9b”:(OI)F

My little PowerShell script

As the above is tedious manual labor with a lot of copy pasting. This is time consuming and tedious at best. With larger guest clusters the probability of mistakes increases. To fix this we write a PowerShell script to handle this for us.

Below is an example of the output of this script. It provides some feedback on what is happening.

Correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backup

Correcting the permissions on the folder with VHDS files & checkpoints for host level Hyper-V guest cluster backup

PowerShell for the win. This saves you some searching and typing and potentially making some mistakes along the way. Have fun. More testing is underway to make sure things are now predictable and stable. We’ll share our findings with you.

A first look at shared virtual disks in Windows Server 2016

Introduction to shared virtual disks in Windows Server 2016

Time to take a first look at shared virtual disks in Windows Server 2016 and how they are set up. Shared VHDX was first introduced in Windows Server 2012 R2. It provides shared storage for use by virtual machines without having to “break through” the virtualization layer. This way is still available to us in Windows Server 2016. The benefit of this is that you will not be forced to upgrade your Windows Server 2012 R2 guest clusters when you move them to Windows Server 2016 Hyper-V cluster hosts.

The new way is based on a VHD Set. This is a vhds virtual hard disk file of 260 MB and a fixed or dynamically expanding avhdx which contains the actual data. This is the “backing storage file” in Microsoft speak. The vhds file is used to handle the coordination of actions on the shared disk between the guest cluster nodes?

Note that an avhdx is often associated with a differencing disk or checkpoints. But the “a” stands for “automatic”. This means the virtual disk file can be manipulated by the hypervisor and you shouldn’t really do anything with it. As a matter of fact, you can rename this off line avhdx file to vhdx, mount it and get to the data. Whether this virtual disk is fixed or dynamically expanding doesn’t matter.

You can create on in the GUI where it’s just a new option in the New Virtual Hard Disk Wizard.

Or via PowerShell in the way you’re used to with the only difference being that you specify vhds as the virtual disk extension.

In both cases both vhds and avhdx are created for you, you do not need to specify this.

You just add it to all nodes of the guest cluster by selecting a “Shared Drive” to add to a SCSI controller …

… browsing to the vhds , selecting it and applying the settings to the virtual machine. Do this for all guest cluster nodes

Naturally PowerShell is your friend, simple and efficient.

Rules & Restrictions

As before shared virtual disk files have to be attached to a vSCSI controller in the virtual machines that access it and it needs to be stored on a CSV. Both block level storage or a SMB 3 file share on a Scale Out File Server will do for this purpose. If you don’t store the shared VHDX or VHD Set on a CSV you’ll get an error.

Sure for lab purposes you can use an non high available SMB 3 share “simulating” a real SOFS share but that’s only good for your lab or laptop.

The virtual machines will see this shared VHDX as shared storage and as such it can be used as cluster storage. This is an awesome concept as it does away with iSCSI or virtual FC to the virtual machines in an attempt to get shared storage when SMB 3 via SOFS is not an option for some reason. Shared VHDX introduces operational ease as it avoids the complexities and drawbacks of not using virtual disks with iSCSI or vFC.

In Windows Server 2012 R2 we did miss some capabilities and features we have come to love and leverage with virtual hard disks in Hyper-V. The reason for this was the complexity involved in coordinating such storage actions across all the virtual machines accessing it. These virtual machines might be running on different hosts and, potentially the shared VHDX could reside on different CSVs. The big four limitations that proved to be show stopper for some use cases are in my personal order of importance:

  1. No host level backup
  2. No on line dynamic resize
  3. No storage live migration
  4. No checkpoints
  5. No Hyper-V Replica support

I’m happy to report most of these limitations have been taken care of in Windows Server 2016. We can do host level backups. We can online resize a shared VHDX and we have support for Hyper-V replica.

Currently in 2016 TPv4 storage live migration and checkpoints (both production and standard checkpoints) are still missing in action but who knows what Microsoft is working on or has planned. To the best of my knowledge they have a pretty good understanding of what’s needed, what should have priority and what needs to be planned in. We’ll see.

Other good news is that shared VHDX works with the new storage resiliency feature in Windows Server 2016. See Virtual Machine Storage Resiliency in Windows Server 2016 for more information. Due to the nature of clustering when a virtual machine loses access to a shared VHDX the workload (role) will move to another guest cluster node that still has access to the shared VHDX. Naturally if the cause of the storage outage is host cluster wide (the storage fabric or storage array is toast) this will not help, but other than that it provides for a good experience. The virtual machine guest cluster node that has lost storage doesn’t go into critical pause but keeps polling to see if it regains access to the shared VHDX. When it does it’s reattached and that VM becomes a happy fully functional node again.

It also supports the new Storage Qos Policies in Windows Server 2016, which is something I’ve found during testing.

Thanks for reading!