Optimizing Backups: PowerShell Script To Move All Virtual Machines On A Cluster Shared Volume To The Node Owing That CSV

When you are optimizing the number of snapshots to be taken for backups or are dealing with storage vendors software that leveraged their hardware VSS provider you some times encounter some requirements that are at odds with virtual machine mobility and dynamic optimization.

For example when backing up multiple virtual machines leveraging a single CSV snapshot you’ll find that:

  • Some SAN vendor software requires that the virtual machines in that job are owned by the same host or the backup will fail.
  • Backup software can also require that all virtual machines are running on the same node when you want them to be be protected using a single CSV snap shot. The better ones don’t let the backup job fail, they just create multiple snapshots when needed but that’s less efficient and potentially makes you run into issues with your hardware VSS provider.

image

VEEAM B&R v8 in action … 8 SQL Server VMs with multiple disks on the same CSV being backed up by a single hardware VSS writer snapshot (DELL Compellent 6.5.20 / Replay Manager 7.5) and an off host proxy Organizing & orchestrating backups requires some effort, but can lead to great results.

Normally when designing your cluster you balance things out a well as you can. That helps out to reduce the needs for constant dynamic optimizations. You also make sure that if at all possible you keep all files related to a single VM together on the same CSV.

Naturally you’ll have drift. If not you have a very stable environment of are not leveraging the capabilities of your Hyper-V cluster. Mobility, dynamic optimization, high to continuous availability are what we want and we don’t block that to serve the backups. We try to help out to backups as much a possible however. A good design does this.

If you’re not running a backup every 15 minutes in a very dynamic environment you can deal with this by live migrating resources to where they need to be in order to optimize backups.

Here’s a little PowerShell snippet that will live migrate all virtual machines on the same CSV to the owner node of that CSV. You can run this as a script prior to the backups starting or you can run it as a weekly scheduled task to prevent the drift from the ideal situation for your backups becoming to huge requiring more VSS snapshots or even failing backups. The exact approach depends on the storage vendors and/or backup software you use in combination with the needs and capabilities of your environment.

cls

$Cluster = Get-Cluster
$AllCSV = Get-ClusterSharedVolume -Cluster $Cluster

ForEach ($CSV in $AllCSV)
{
    write-output "$($CSV.Name) is owned by $($CSV.OWnernode.Name)"
    
    #We grab the friendly name of the CSV
    $CSVVolumeInfo = $CSV | Select -Expand SharedVolumeInfo
    $CSVPath = ($CSVVolumeInfo).FriendlyVolumeName

    #We deal with the \ being and escape character for string parsing.
    $FixedCSVPath = $CSVPath -replace '\\', '\\'

    #We grab all VMs that who's owner node is different from the CSV we're working with
    #From those we grab the ones that are located on the CSV we're working with
      $VMsToMove = Get-ClusterGroup | ? {($_.GroupType –eq 'VirtualMachine') -and ( $_.OwnerNode -ne $CSV.OWnernode.Name)} | Get-VM | Where-object {($_.path -match $FixedCSVPath)} 
     
    ForEach ($VM in $VMsToMove)

    {
        write-output "`tThe VM $($VM.Name) located on $CSVPath is not running on host $($CSV.OwnerNode.Name) who owns that CSV"
        write-output "`tbut on $($VM.Computername). It will be live migrated."
        #Live migrate that VM off to the Node that owns the CSV it resides on
        Move-ClusterVirtualMachineRole -Name $VM.Name -MigrationType Live -Node $CSV.OWnernode.Name
    }

Now there is a lot more to discuss, i.e. what and how to optimize for virtual machines that are clustered. For optimal redundancy you’ll have those running on different nodes and CSVs. But even beyond that, you might have the clustered VMs running on different cluster, which is the failure domain.  But I get the remark my blogs are wordy and verbose so … that’s for another time Winking smile

DELL Compellent Hardware VSS Provider & Commvault on Windows Server 2012 Hyper-V nodes – Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0x80070005, Access is denied.

As you know by now I’ve been building a high throughput, large volume Disk2Disk backup solution and that has been rather successful. At optimal speed we get 2TB/Hour per backup media server. As we currently have two, we can get to 4TB/hour at maximum throughput Smile. Currently that is, we’ll see if more is possible. The solution, which I’ll blog about later, is based on the Dell Compellent hardware VSS provider (Dell Compellent Replay Manager 6.2.0.9), Windows Server 2012, CommVault 9.0 R2 and PowerVault storage a the target disks and as it’s working now is saving us many hundred of thousands of Euros compared to dedicated D2D backup appliances or solutions.

The entire process has been a fairly smooth one, which was a relief as decent hardware VSS providers are not easy to come by, based on our experience and that of many colleagues. So, once again the DELL Compellent choice is turning out to be a good one. We did have to fix one small issue along the way.

Whilst using the DELL Compellent Hardware VSS Provider with Commvault Simpana 9.0 R2  to do host level back of the virtual machines on Windows Server 2012 Hyper-V cluster nodes. We ran into a small issue. The backup run fine but we saw this error being logged:

Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.
. This is often caused by incorrect security settings in either the writer or requestor process.

Operation:
   Gathering Writer Data

Context:
   Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}
   Writer Name: System Writer
   Writer Instance ID: {3f4965d8-10ac-411b-bf6d-6a607f237775}

image

The description found here was not helpful in resolving this. This error is found all over the internet with just about any backup product. The possible causes and solutions that are suggested are as follows:

Change the Language for non-Unicode programs to English (United States)

This is not a solution in our case, which would have surprised me if it had been.

image

Eliminate the error condition by adding the access permissions for the domain account that the Compellent Replay Manager Service for Microsoft Servers (VSS) runs under to COM Security of the affected server

As the domain account used for this service is a member of the local administrators group that already has the permissions this would also have surprised me but we tested it anyway. It turns out this is not the cause either.

image

But for your information this is how it’s done:

You can eliminate the error condition by adding the access permissions for the Network Service account to the COM Security of the affected server. To add the access permissions for the Network Service account, do the following:

  • From the Start Menu, select Run
    The Run dialog opens.In the Open field, input dcomcnfg and click OK.
    The Component Services dialog opens.
  • Expand Component Services, Computers, and My Computer.
    Right-click My Computer and click Properties on the pop-up menu.
  • The My Computer Properties dialog opens.
  • Click the COM Security tab.
    Under Access Permission click Edit Default.
  • The Access Permissions dialog opens.
  • From the Access Permissions dialog, add the DOMAINcompellentreplayservice account with Local Access & Remote access allowed (cluser => not just a local host).
  • Close all open dialogs.
  • Restart the computer.

The real cause & solution

According to Microsoft support this issue occurs when using a 3rd party backup program that utilizes Windows VSS (Volume Shadow Service) and has its own requestor. This is indeed the case here. “It looks like the requestor (the backup application) does not allow system writer to call back into their process and hence generates the error in the application log.” This sounds very plausible Winking smile

The fix:

  1. The following example grants access to the "DOMAINcompellentreplayservice" account.
  2. Click on Start, type regedit in the search box.
  3. On the Registry Editor window, navigate to: KEY_LOCAL_MACHINE>SYSTEM>CurrentControlSet>Services>VSS>VssAccessControl
  4. Add a DWORD value with the name: "DOMAINcompellentreplayservice" and set the value to “1”.

image

image

image

And voila: the error has gone when running backups Open-mouthed smile

image

I assume no responsibility if you do this in your environment but I can say that all this works perfectly in our CommVault Simpana 9.0 R2 setup whist backing up the virtual machines on our Hyper-V cluster nodes at the host level using the DELL Compellent hardware VSS provider. And yes this is Windows Server 2012, careful testing, planning helps when being an early adaptor.