Set the Hyper-V volume-specific settings in Veeam with PowerShell

Set the Hyper-V volume-specific settings in Veeam with PowerShell

When adding and configuring Hyper-V servers to Veeam you can set the Hyper-V volume-specific settings in Veeam with PowerShell or in the GUI.

  1. Select what VSS provider to use (Windows native VSS or a Hardware VSS provider)
  2. Configure the maximum number of concurrent snapshots to allow for the volume

I will show how to Set the Hyper-V volume-specific settings in Veeam with PowerShell. But first, let’s remind our selves of what it is used for.

The first one is easy. You will use the Windows native VSS unless you have a hardware VSS provider installed and configured. These come from your storage array vendor. Hardware VSS providers are only available for volumes that are provided by that storage array. If you don’t set them manual Veeams scans your host en picks the best option. It does so based on the type of volume and the availability of a hardware VSS provider or not.

The second option’s meaning depends on the version of Windows and also on whether you leverage a hardware VSS provider or not. You see the value of the maximum number of concurrent snapshots doesn’t always result in the same behavior you might expect.

Lets look at the documentation

I invite you to read the Veeam documentation on this subject. below you will find an excerpt with my annotations.

Follow the link for each option to learn more in the on line Veeam documentation.

  1. For Microsoft Hyper-V 2012 R2 and earlier, the default is set to simultaneously store 4 snapshots of one volume. To change this number, specify the Max snapshots value. It is not recommended that you increase the number of snapshots for slow storage. Many snapshots existing at the same time may cause VM processing failures.
  2. For Microsoft Hyper-V Server 2016 and later. You can simultaneously store 4 VM checkpoints on one volume. To change this number, specify the Max snapshots value. Note that this limitation works only for (recovery) checkpoints created during Veeam Backup & Replication data protection tasks. When you still use host VSS provider in your backup process (with a SAN hardware VSS provider, combined with off-host Hyper-V proxies) this acts like before. It will not limit the number of concurrent VM backup jobs. That only happens when the Hyper-V recovery checkpoints are the only thing in play. This means that for an S2D or Azure Stack HCI solution for example you will need to increase this value if you want to have more than 4 VM backed up simultaneously on that volume. No matter how many concurrent tasks you set on your Hyper-Hosts and repositories. By the way, remember that a task does not equal a VM but a disk per VM / backup file per VM. In a simple example with nothing else in play, this means that 16 tasks can be 4 VMs if those VMs all happen to have 4 disks, etc.
Set the Hyper-V volume-specific settings in Veeam with PowerShell
The default setting for maximum concurrent snapshots is 4

Now we have that out of the way. I find it tedious to do all this in the GUI. Especially so in larger environments and during testing in the lab or prior to taking a solution into production. There can be many hosts and even more volumes to configure. This is why I Set the Hyper-V volume-specific settings (and other configurations) in Veeam with PowerShell.

How to set the Hyper-V volume-specific settings in Veeam with PowerShell

So here I will share how to do this in PowerShell. It is not very difficult. Below snippet is the crux of what you need to integrate into your own scripts. Below I grab all the volumes on all the nodes of a cluster and set the MaxSnapShot value to 8. Tun a Hyper-V backup job against those CSV’s with 10 single disks VMs. You’ll see we can no have up to 8 VMs being backed up concurrently instead of 4.

I am also showing how to set the VSS provider. Warning, PowerShell will let you set a wrong provider. The GUI protects against that, So pay attention here.

#Grab the Cluster whose nodes volumes we want to configure
$Cluster = Get-Vbrserver -Name W2K19-LAB.datawisetech.corp -type HvCluster

#Grab the correct Hyper-V hosts based on the parentid (cluster they belong to)
$ClusterNodes = Get-VBRServer -Type HvServer | Where ParentID -eq $Cluster.Id 

Foreach ($ClusterNode in $ClusterNodes) {
    $ServerVolumes = Get-VBRHvServerVolume -Server $ClusterNode.Name
    $Provider = Get-VBRHvVssProvider -Server $ClusterNode.Name -Name "Microsoft CSV Shadow Copy Provider"
    Foreach ($Volume in $ServerVolumes) {
        if ($Volume.Type -eq "CSV") {

            Set-VBRHvServerVolume -Volume $Volume -MaxSnapshots 8 -VSSProvider $Provider
        }
    }
}
Set the Hyper-V volume-specific settings in Veeam with PowerShell
Only the CSV volumes have had their Max concurrent snapshot increased to 8.

Conclusion

I have shown you how to set the Hyper-V volume-specific settings in Veeam with PowerShell (VSSProvider/max concurrent snapshots

The max concurrent snapshots value is not the only setting determining how many VMs you can backup concurrently in one job. But it is an important one to know about when leveraging recovery checkpoints. You also need to mind max concurrent tasks.

Every virtual disk being backed up counts as a task. So a virtual machine with 3 disks will consume 3 tasks out of the max concurrent tasks you have set on the backup proxy. Don’t go overboard. Count cores when determining how to set these values. Also, remember that taking it easy to speed things up is a rule in backups. There is no speed gained by trying to do more than your cores can handle. Or, when you have plenty of cores by, depleting IOPS on your storage.

I will show you how to configure those with PowerShell in future blog posts.

Troubleshooting 100% stalled Veeam backup jobs

Introduction

Recently I got to diagnose a really interesting Veeam Backup & Replication symptom. Imagine you have a backup environment that runs smoothly. All week long but then, suddenly, running backup jobs stall. News jobs that start do not make an ounce of progress. It is as the state of every job is frozen in time. Let’s investigate and dive into troubleshooting 100% stalled Veeam backup jobs.

That morning, the backup jobs had not made an ounce of progress since the night before and they never will, you can leave this for days sometimes a job in between does seem to work properly, but most often not so the job queue builds up.

Troubleshooting 100% stalled Veeam backup jobs

When looking at the stalled jobs, nothing in the Veeam GUI indicates an error. Looking at the Windows event logs we see no warning, error, or critical messages. All seems fine. As this Veeam environment uses ReFS on storage spaces we are a bit weary. While the bugs that caused slowdowns have been fixed, we are still alert to potential issues. The difference with the know (fixed) ReFS issues that this is no slowdown, No sir, the Veeam backup jobs have literally frozen in time but everything seems to be functional otherwise.

Another symptom of this issue is that the synthetic full backups complete perfectly well, but they finfish with an error message none the less due to a time out. This has no effect on the synthetic backup result (they are usable) but it is disconcerting to see an issue with this.

On top of that, data copies into the ReFS volumes work just fine and at an excellent speed. Via performance monitor, we can see that the rotation of full regions from mirror to parity is also working well once the mirror tier has reached a specified capacity level.

Time to dive into the Veeam logs I would say.

Veeam backup job log

So the next stop is the Veeam logs themselves. While those can seem a little intimidating, they are very useful to scroll through. And sure enough, we find the following in one of the stalled jobs its backup log.

This goes on all through the night …

For hours on end … it goes on that way.

VIRTUAL MACHINE TASK LOGS 1

When we look at the task log of ar virtual machine that is still at 0% we see the same reflected there. Note that nothing happens between 22:465 and 05:30, that’s when I disabled and enabled the vNIC of the preferred networks in the VBR virtual machine and it all sprung back to life.

notice the total stand still form 22:46 to 05:30 …

So it is clear we have a network issue of some sort. We checked the repository servers and the Hyper-V cluster but there everything is just fine. So where is it?

Virtual Machine task logs 2

We dive into the task log of one of the virtual machines who’s backing up and that is hanging at 88%. There we see one after the other reconnect to the repository IP (over the preferred network as defined in VB). That also happens all night long until we reset the VBR virtual machine’s preferred network vNICs. In the log snippet below notice the following:

Error    A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond (System.Net.Sockets.SocketException)

From the logs we deduct that the network error appears to be on the VBR virtual machine itself. This is confirmed by the fact that bouncing the vNICs of the preferred network (10.10.110.x is the preferred network subnet) on the VBR virtual machine kicks the jobs back into action. So what is the issue? So we start checking the network configurations and settings. The switch ports, pNICs, vNIC, vSwitches etc. to find out what’s going on, As it seems to work for days or a week before the issue shows up we suspect a jumbo frame issue so we start there.

The solution

While checking the configuration we to make sure jumbo frames are enabled on the vNIC and the pNICs of the vSwitch’s NIC team. That’s when we notice the jumbo frames are missing from those pNICs. So we set those again.

From the VBR virtual machine we run some ping tests. The default works fine.

Troubleshooting 100% stalled Veeam backup jobs

When we test with jumbo frames however we notice something. The ping tests do not complain about jumbo frames being too large and that with the “do not fragment” option set the “Packet needs to be fragmented but DF set.” Note it just says “request timed out”. This indicates an issue right here, jumbo frames are set but they do not work.

Troubleshooting 100% stalled Veeam backup jobs
So the requests time out, it does not complain about the jumbo frames … so we have another issue here than just the jumbo frame settings

As the requests time out and the ping test does not complain about the jumbo frames we have another issue here than just the jumbo frame settings. It smells of a firmware and/or driver issue. So we dive a bit further. That’s when I notice the driver for the relevant pNICs (Broadcom) is the inbox Windows driver. That’s no good. The inbox drivers only exists to be able to go out and fetch the vendor’s driver and firmware when need, as a courtesy so to speak. We copy those to the hosts that require an update. In this case, the nodes where the VBR virtual machine can run. The firmware update requires a reboot. When the host is up and the VBR virtual machine is running I test again.

Bingo, now a ping test succeeds.

Troubleshooting 100% stalled Veeam backup jobs
Success

What happened?

So did we really forget to update the drivers? Did we walk out of the offices to go in lockdown for the Corona crisis and forget about it? In the end, it turned out they did run the updates for the physical hosts. But for some reason, the Broadcom firmware and drivers did not get updated properly. However, that failed update seems to have also removed the Jumbo frame settings from the pNICs that are used for the virtual switch. After fixed both of these we have not seen the issue return.

Remarks

The preferred networks do not absolutely have to be present on the VBR server itself. Define, yes, present, no. But it speeds up backup job initialization a lot when they are there present on the VBR server and Veeam also indicates to do so in their documentation.

Why jumbo frames? Ah well the networks we use for the preferred networks are end to end jumbo frame enabled. So we maintain this in to the VBR server. We might get away by not setting jumbo frames on the VBR server but we want to be consistent.

Conclusion

It pays to make sure you have all settings correctly configured and are running the latest and greatest known good firmware. But that should have been the case here. And it all worked so well for quite a while before the backup jobs stall. The issue can lie in the details and sometimes things are not what you assume they are. Always verify and verify it again.

I hope this helps someone out there if they are ever troubleshooting 100% stalled Veeam backup jobs If you need help, reach out in the comments. There are a lot of very experienced and respected people around in my network that can help. Maybe even I can lend a hand and learn something along the way.

Register for VeeamON 2020 Virtual

VeeamON 2020 is on in a few days!

I advise you to Register for VeeamON 2020 Virtual asap. Corona and Covid-19 have pushed VeeamON 2020 to a virtual format but it will rock anyway, Veeam cannot be stopped by a mere global pandemic. While we are all sad we cannot meet up in real life it also has its advantages. Everyone can join in free of charge from the comfort of your (home) office. That’ right. Free, bar the time you invest. A lot of people saw the value in that investment because last week a tweet went out over 22K people had registered already!

The event takes place on the 17th and 18th of June 2020, so you can still register and attend. Simply click here or on the image below.

Register for VeeamON 2020 Virtual
Register for VeeamON 2020 Virtual.

You will find plenty of information on the web site but trust me, I have attended VeeamON before and it is worthwhile. They have their management, technical architects and experts available to you. The speakers are masters of their craft and you get the opportunity to interact with them. For that purpose they will host interactive chat rooms, live expert sessions etc.

So join, learn and share on how to manage and protect your data on-premises as well as in hybrid and cloud environments in the years to come. Veeam evolves with the times and plans to help you move along. You can learn how by attending VeeamON 2020.

Use cases for devnodeclean.exe

Use cases for devnodeclean.exe

So what are use cases for devnodeclean.exe, and what is it? Windows creates an entry in the registry for every new device that is connected. That also goes for storage devices, including VSS shadow copies.

When you create a lot of VSS snapshots, both software (windows) or hardware (SAN) based ones that get mounted and unmounted this creates a lot of the registry entries. Normally these should get cleaned up by the process creating them. Microsoft can take care of their use cases but they cannot do this for 3rd party software as Windows cannot know the intent of that software. Hence you might end up with a registry system hive that starts to bloat. When that hive gets big enough you will notice slowdowns in shutdown and restarts. These slowdowns can become very long and even lead to failure to boot Windows.

This can happen with SAN hardware VSS provider backup software or with a backup solution that integrates with SAN hardware VSS providers. Mind you it is not limited to hardware VSS providers. It can also happen with software VSS providers. Microsoft actually had this as a bug with Hyper-V backups a long time ago. A hotfix fixed the issue by removing the registry entries the backups created.

But not all software will do this. Not even today. The better software does, but even Veeam only provided this option in VBR 9.5 Update 4. Mind you, Veeam is only responsible for what they control via storage integrations. When you leverage an off-host proxy with Hyper-V Veeam collaborates with the hardware VSS provider but does not orchestrate the transportable snapshots itself. So in that case the clean up is the responsibility of the SAN vendor’s software.

Another use case I have is file servers on a SAN being backup and protected via hardware VSS snapshots with the SAN vendors software. That also leads to registry bloat.

I never had any issues as I clean up the phantom registry entries preemptively. Veeam actually published a KB article as well on this before they fixed it in their code.

Still, if you need to clean up existing phantom registry entries you will need to use a tool call devnoceclean.exe.

Preventing registry bloat

When the software responsible doesn’t prevent registry bloat you will have to clean up the phantom registry entries in another way. Doing this manually is tedious and not practical. So, let’s forget about that option.

You can write your own code or script to take care of this issue. Cool if you can but realize you need to be very careful what you delete in the registry. Unless you really know your way around the depths of storage-related entries in the registry I suggest using a different approach, which I’ll discuss next.

Another solution is to use the Microsoft provided tool devnodeclean.exe. This tool is Microsft’s version of its example code you can find here How to remove registry information for devices that will never be used again

You can download that tool here. Extract it and grab the .exe that fits your OS architecture, x86, or x64 bit. I usually put in the subfolder Bin under C:\SysAdmin\Tools\DevNodeClean\ where I also create a subfolder named Logs. Remember you need to run this with elevated permissions. devnodeclean.exe /n list the entries it will remove while just running it without a switch actually removes the entries. It will work with Windows Server 2012(R2), 2016, and 2019. It can take a while if you have many thousands of entries.

Use cases for devnodeclean.exe
Use cases for devnodeclean.exe

While you can run the tool manually in “one-off” situations normally you’ll want to run it automatically and regularly. For that, I use a PowerShell script that logs the actions and I use Task Scheduler to run it every day or week. that depends on the workload on that host.

Sample Code

Below is some sample code to get you started.

$TimeStamp = $(((get-date)).ToString("yyyyMMddTHHmmss"))
$PathToDevNodeClean = 'C:\SysAdmin\Tools\DevNodeClean'
Start-Transcript -Path "$PathToDevNodeClean\Logs\DevNodeCleanLog-$TimeStamp.txt"
Write-Output get-date ': Starting registry cleanup of phantom VSS entries'
Invoke-Expression "$PathToDevNodeClean\Bin\DevNodeClean.exe"
Write-Output get-date 'Cleaning up old log files'

$DaysToRetain = 0
$DateTime = ((Get-Date).AddDays(-$DaysToRetain))
$AllLogFilesInDevNodeClean = Get-ChildItem -Path "$PathToDevNodeClean\Logs" -Filter "DevNodeCleanLog-*.txt" -Force -File | Where-Object { $_.CreationTime -lt $DateTime }

foreach ($File in $AllLogFilesInDevNodeClean) {
    $FileName = $File.FullName
    $TimeStamp = Get-Date
    try {
        Remove-Item -Path $FileName -ErrorAction SilentlyContinue
        Write-Output "$TimeStamp > Deleting file $FileName because it was created before $DateTime"
    } 
    catch {
        Write-Output "$TimeStamp > Failed to delete $FileName It is probably in use" 
        Write-Output $TimeStamp $_.Exception.Message
    }
    finally {      
    }
}
Stop-Transcript 

Good luck with your devnodeclean.exe adventures. As with any sample code, big boy rules apply, use it at your own risk and test before letting this lose on your production systems.

This is just one example of how my long time experience with Windows storage and backups prevents problems in environments I manage or design. If you need help or have a question, reach out and we’ll try to help.