Veeam Backup & Replication leverages SMB Multichannel

Introduction

Is it true that Veeam Backup & Replication leverages SMB Multichannel? That is a question that I was asked recently. The answer is yes, when you have a backup design and configuration that allows for this. If that’s the case it will even happen automatically when possible. That’s how SMB 3 works. That means it’s a good idea to pay attention to the network design so that you’re not surprised by the route your backup traffic flows. Mind you that this could be a good surprise, but you might want to plan for it.

I’ll share a quick lab setup where SMB 3 Multichannel kicks in. Please don’t consider this a reference guide for your backup architectural design but as a demo of how SMB multichannel can be leveraged to your advantage.

Proving Veeam Backup & Replication leverages SMB Multichannel

Here’s a figure of a quick lab setup I threw together.

clip_image002

There are a couple of significant things to note here when it comes to the automatic selection of the best possible network path.

SMB 3 Multichannel picks the best solution based on its logic. You can read more about that here. I’ve included the figure with the overview below.

clip_image003

The figure nicely show the capabilities of the NIC situation. To select the best possible network path SMB 3 uses the following logic:

1. RDMA capable NICs (rNICs) are preferred and chosen first. rNICs combine the highest throughput, the lowest latency and bring CPU offloading. on the processor when pushing through large amounts of data.

2. RSS capable NICs: NIcs with Receive Side Scaling (RSS) improve scalability by not being limited to core zero on the server. Configured correctly RSS offers the second-best capabilities.

3. The speed of the NICs is the 3rd evaluation criteria: a 10 Gbps NIC offers way more throughput than a 1 Gbps NIC.

Following this logic it is clear that Multichannel will select our 2 RDMA capable 10Gbps NICs over the management LBFO interface which does not support RDMA and while supporting RSS can only deliver 2Gbps throughput at best. That’s exactly what you see in the screenshot below.

image

Conclusion

So yes, Veeam Backup & Replication leverages SMB Multichannel! Please note that this did not require us to set SMB 3 Multichannel constraints or a preferred network for backups in Veeam Backup & Replication. It’s possible to do so when needed but ideally you design your solution to have no need for this and let automatic detection chose the best network path correctly. This is the case in our little lab setup. The backup traffic flows over 10.1.0.0/16 network even when our Veeam Backup & Replication VM, the Hyper-V host and the backup target have 10.10.0.0/16 as their management subnet. That’s the one they exist on the Active Directory domain they belong to for standard functionality. But as both the source and the target can be reached via 2*10Gbps RDMA capable NICs on the 10.1.0.0/16 subnet SMB3 will select those according to its selection criteria. No intervention needed.

SMB Direct Support

Now that we have shown that Veeam Backup & Replication backups in certain configurations can and will leverage SMB Multichannel to your benefit another question pops up. Can and does Veeam Backup & Replication leverage SMB Direct? The answer to that is also, yes. If SMB Direct is correctly configured on all the hosts and switches their networks paths in between it will. Multichannel is the mechanism used to detect SMB Direct capabilities, so if multichannel works and sees SMB Direct is possible it will leverage that. That’s why when SMB Direct or RDMA is enabled on your NICs it’s important that it is configured correctly throughout the entire network path used. Badly configured SMB Direct leads to very bad experiences.

Now think about that. High throughput, low latency and CPU offloading, minimizing the CPU impact on your Hyper-V hosts, SOFS nodes, S2D nodes and backup targets. Not bad at all, especially not since you’re probably already implementing SMB Direct in many of these deployments. It’s certainly something that could and should be considered when design solutions or optimizing existing ones.

More SMB3 and Windows Server 2016 Goodness

When you put your SMB3 file share continuously available on a Windows 2012 (R2) or Windows Server 2016 cluster (it doesn’t need to be on a CSV disk) you’ll gain high availability trough transparent failover with SMB3 and except for a short pause your backups will keep running even when the backup target node reboots or crashes after the File Server role has failed over. Now, start combing that with ReFSv3 in Windows Server 2016 and the Veeam Backup & Replication v9.5 support of this and you can see a lot of potential here to optimize many aspects of your backup design delivering effective and efficient solutions.

Things to investigate further

One question that pops up in my mind is what happens if we configure a preferred back-up network in Veeam Backup & Replication. Will this affect the operation of SMB multichannel at all? By that I means would enabling a preferred network in Veeam prevent multichannel from using more than one NIC?

I my opinion it should allow for multiple scenarios actually. When you have equally capable NICs that are on different subnets you might want to make sure it uses only one. After all, Veeam uses the subnet to configure a preferred path, or multiple subnets for that matter. Now multichannel will kick in with multiple equally capable NICs whether they are on the same subnet or not and if they are on the same subnet you might want them both to be leveraged even when setting a preferred path in Veeam. Remember that 1 IP / NIC is used to set up an SMB session and then it detects capabilities available, i.e. multiple paths, SMB Direct, RSS, speed, within 1 or across multiple subnets.

I’ll leave the combination of Veeam Backup & Replication and SMB multichannel for a future blog post.

Tips on using Convert-VMGeneration.ps1 with Windows Server 2016

Introduction

Recently I was involved in getting a bunch of “holy cow” virtual machines updated/migrated to be future ready (shielded VMs, see Guarded fabric and shielded VMs overview).

That means they have to be on Windows 2012 R2 as the guest OS minimally .For us anyway, we’re not falling behind the curve OS wise. That’s the current legacy OS in the environment. Preferably they need to be at Windows Server 2016. This is has been taken care of and 40% of the virtual machines is already running Windows Server 2016 for the Guest OS, the remainder is at Windows Server 2012 R2 and those are moving to Windows Server 2016, where useful and possible, at a steady pace.,

When deploying new virtual machines the default is to use generation 2 virtual machines. Any remaining virtual machines that cannot be replaced need to be converted to generation 2. For that we routinely use the great script provided by Microsoft’s John Howard (see Hyper-V generation 2 virtual machines – part 10)  We’ll share some tips on using Convert-VMGeneration.ps1 with Windows Server 2016, which is an OS / Hyper-V version later than what the script was written for and tested against.

Tips on using Convert-VMGeneration.ps1 with Windows Server 2016

During the use of this script we came across a couple of new situations for us. One of those were Window Server 2016 virtual machines that are still generation 1 and reside on either a Windows Server 2012 (R2) or Windows Server 2016 host. Another were virtual machines with Windows Server 2012 R2 or Windows Server 2016 as a guest OS that already live on Windows Server 2016 and are still generation 1 and have either already been converted to or installed on a virtual machine version 8 or not (still at 5). All these can be death with successfully.

Situation 1

Running the script on a Windows Server 2016 Host. This throws an error reporting that the was only tested with PS version 4.

clip_image002

This is easily dealt with by using the -noPSVersionCheck switch, it even tells you to do so in the error message. I have found no issues in doing so.

.\Convert-VMGeneration.ps1 -VMName “MyVM” -path “C:\ClusterStorage\Volume1\ConvertedMyVM” -NoPSVersionCheck

Situation 2

Running the script against a generation 1 virtual machine with a Windows Server 2016 guest OS required a little adaptation of the script as it has an issue with detecting the guest OS version as supported. This is due to the fact that in the script the check is done against string values and they generate a logical “bug” when the doing.

clip_image004

Checking if a string of value 7 -lt 6 will evaluate correctly but doing the same with 10 doesn’t, that’s false. An error message is show that the “Source OS must be version 6.2 (Windows 8/Windows Server 2012) or later”. Well is most certainly is, but the 10 in 10.0.14393.206 is not seen as greater or equal to six.

We fixed by converting the 1st and 2nd part (for good measure) of the OS version string to an integer before the check happens. That fixes it for us.

We’ll demonstrate this in a code snippet to run on a Windows Server 2016 host.

$SourceNTDLL = "C:\windows\system32\ntdll.dll"

$script:ProgressPoint = 651

$SourceOSVersion = ([System.Diagnostics.FileVersionInfo]::GetVersionInfo($SourceNTDLL).FileVersion)

$script:ProgressPoint = 652

$SourceProductName = ([System.Diagnostics.FileVersionInfo]::GetVersionInfo($SourceNTDLL).ProductName)

$SourceOSVersionParts = $SourceOSVersion.split(".")

if ($SourceOSVersionParts[0]-lt 6) { Write-Host "Source OS must be version 6.2 (Windows 8/Windows Server 2012) or later." }

if (($SourceOSVersionParts[0] -eq 6) -and ($SourceOSVersionParts[1] -lt 2)) {Write-Host "Source OS must be version 6.2 (Windows 8/Windows Server 2012) or later." }

This will give you the massage that the “Source OS must be version 6.2 (Windows 8/Windows Server 2012) or later”. So, we cast the $SourceOSVersionParts[X] variables to an integer to overcome this.

$SourceNTDLL = "C:\windows\system32\ntdll.dll"

$script:ProgressPoint = 651

$SourceOSVersion = ([System.Diagnostics.FileVersionInfo]::GetVersionInfo($SourceNTDLL).FileVersion)

$script:ProgressPoint = 652

$SourceProductName = ([System.Diagnostics.FileVersionInfo]::GetVersionInfo($SourceNTDLL).ProductName)

$SourceOSVersionParts = $SourceOSVersion.split(".")

#Cast the OS version parts to an integer

$OSVersionPart1 =[INT]$SourceOSVersionParts[0]

$OSVersionPart2 =[INT]$SourceOSVersionParts[1]

if ($OSVersionPart1 -lt 6) { Write-Host -ForegroundColor Green "Source OS must be version 6.2 (Windows 8/Windows Server 2012) or later." }

if (($OSVersionPart1 -eq 6) -and ($OSVersionPart2 -lt 2)) { CleanUp "Source OS must be version 6.2 (Windows 8/Windows Server 2012) or later." }

Do this and it evaluates correctly now so your script will run. That’s the only adaption we had to make in the script to make it run with a Windows Server 2016 guest OS.

Situation 3

My virtual machine is already a version 8 VM but still a generation 1 virtual machine. That’s not a problem at all. As long as you deal with situation 1 and 2, it will convert correctly.

Conclusion

If you’re prepping legacy virtual machines that need to be moved into a modern private cloud or on premises deployment you might need to convert them to generation 2 in order to take full advantage of the capabilities of the current Hyper-V platform (i.e. Shielded VMs). To do so you’ll be fine as long as they are running Windows Server 2012 (R2) as a guest OS on a Windows 2012 R2 host. If not, some creativity is all you need to get things going. Upgrade the guest OS if needed and fix the script if you encounter the situations as we described above. Sure, we have to herd virtual machines as cattle and avoiding holy cows VMs is important. But they do still exist and if they provide valuable services and we can’t let this hold us back from moving ahead. By proceeding like we did we prevented just that and avoided upsetting too many processes and people in the existing situation, let alone hindering them in the execution of their job. We still arrived at a situation where the virtual machines can be hosted as shielded virtual machines. Good luck!

Cluster Operating System Rolling Upgrade Leaves Traces

Introduction

When you perform a cluster OS rolling upgrade of Windows Server 2012 R2 cluster to a Windows Server 2016 Cluster you’ll have two options.

1. You evict the nodes, one after the other, perform a clean OS install and join them to the existing cluster.

2. You do an in-place OS upgrade of the nodes (no need to evict the nodes, you can if you want to). I tested this and blogged about it in In Place upgrades of cluster nodes to Windows Server 2016  

Both of these give you the benefits that you can keep your workloads (Hyper-V, SOFS, SQL Server) running and you don’t have to create a new cluster to do so. The moment you have Windows Server 2016 Nodes added to an existing Windows Server 2012 R2 cluster you are running in Mixed mode. Until all your nodes have been upgraded to Windows Server 2016 will remain running in mixed mode.

Illustration showing the three stages of a cluster OS rolling upgrade: all nodes Windows Server 2012 R2, mixed-OS mode, and all nodes Windows Server 2016

When there are only Windows Server 2016 nodes you can decide to also upgrade the cluster functional level.  This enables all the new capabilities in Windows Server 2016 Failover Clustering and also means you cannot go back to a Windows 2012 R2 cluster anymore. So, only take this step after a final validation of all drivers and firmware to make sure you don’t need to go back and you’re ready to fully commit to a fully functional Windows Server 2016 Failover Cluster.

A cluster operating system rolling upgrade does leave some traces, but that’s OK. Let’s take a look. 

This is what a get-cluster against Windows Server 2016 that was upgraded from Windows Server 2012 R2 looks like.

image

As you can see the cluster functional level is 8 and not 9 yet. This means that we have not yet run the Update-ClusterFunctionalLevel command on this cluster yet. Which still allows us to roll back all the way to a cluster running only Windows 2012 R2 nodes. The ClusterUpgradeVersion has a value of 3.

We now execute the Update-ClusterFunctionalLevel command and take a look at Get-Cluster again.

image

As you can see we are now at cluster functional level 9 which enables all the capabilities offered by Windows Server 2016 Failover Clustering. The cluster Upgrade version is 8. That’s the previous cluster functional level we were at before we executed Update-ClusterFunctionalLevel.

Note that both properties ClusterFunctionalLevel and ClusterUpgradeVersion are only available with Windows Server 2016. You will not find it on a Windows Server 2012 R2 or lower cluster. If you run this command from Windows Server 2016 against a Windows Server 2012 R2 cluster both properties will be empty. If you run it on a Windows Server 2012 R2 host against Windows Server 2012 R2 or lower and even a Windows Server 2016 cluster these properties are not even there. The commandlet is older on those OS versions and didn’t know about these properties yet.

What about if you create a brand-new cluster, perhaps even on freshly installed windows Server 2016 Nodes? What does ClusterUpgradeVersion have as a value then? Well it’s also 8. In the end, there is no difference between an in-place upgrade Windows Server 2016 cluster and a cleanly created one. So where are those traces?

Cluster Operating System Rolling Upgrade Leaves Traces

What gives a rolling upgrade away is that in the registry, under the HKLM\Cluster the OS and OSVersion values are not updated (purple in the picture below). This is a benign artifact and I don’t know if this if on purpose or not.  I have changed them to Windows Server 2016 Datacenter as an experiment and I have not found any issues by doing so. Now, please don’t take this as recommendation to do so. The smartest and safest thing is to leave it alone. These are not used, so don’t worry about them.

image

But even if you would change those values a cluster resulting from of a cluster operating system rolling upgrade still has other ways of telling it was not born as a Window Server 2016 Cluster.

Under HKLM\Cluster (and Cluster.0) you’ll find the value CusterFunctionaLevel that does not exist on a cleanly installed Windows Server 2016 Cluster (green in the picture above). As you can see this is a Window Server 2016 cluster running at functional level 9.

There is even an extra key OperatingVersion under HKLM\Cluster that you will not find on a cleanly installed cluster either. It also has a Mixed Mode value under that key which indicates whether the cluster is still running in mixed mode or not.

image

Here is a screenshot of newly installed/created Windows Server 2016 cluster. No ClusterFunctionalLevel value, the OS and OSVersion Values are correct and there is no OperatingVersion key to be found.

image

What if you don’t like traces?

First of all, these traces are harmless. One thing you can do if you want to weed out all traces of a rolling upgrade (as far as the cluster is concerned) is to destroy the cluster and create one with the same CNO (and IP address if that was a fixed one). This might all be a bit more involved when it comes to CSV naming and other existing resources but then these remnants will be gone in a supported way. Now this does defeat one of the main purposes of this feature: no down time. The operating system itself might also contain traces if you did in-place OS upgrades but the cluster will not. Just adapting OS/OSVersion, ClusterFunctionalLevel and deleting the key OperatingVersion from HKLM\Cluster (and HKLM\Cluster.0) are not supported actions and messing around in the cluster registry keys can lead to problems, so don’t! The advice is to just leave it all alone. Microsoft developed cluster operating system rolling upgrade the way they did for a reason and by leaving things as Microsoft has set or left them will make sure you are always in a fully supported condition. So, use it if it fits the circumstances & you comply with all the prerequisites. Look at these traces a flag of honor, not a smudge on your shining armor. When I see these artifacts, I see people who have used this feature to their own benefits. Well done I say.

Learn more about the Cluster OS Rolling Upgrade process

Next to my blogs like First experiences with a rolling cluster upgrade of a lab Hyper-V Cluster (Technical Preview) and In Place upgrades of cluster nodes to Windows Server 2016 there are many resources out there by fellow blogger and Microsoft. A great video on the subject is Introducing Cluster OS Rolling Upgrades in Windows Server 2016 with Rob Hindman, who actually works on this feature and knows it inside out.

An important thing to keep in mind is that this can be automated using PowerShell or by leveraging SCVMM for orchestration for example. 3rd party tools could also support this and help you automate this process in order to scale it when needed.

Finally, the official documentation can be found here Cluster operating system rolling upgrade

Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA

Carsten and I dove into our labs and played around with RemoteFX and Discrete Device Assignment in Windows Server 2016 Hyper-V and RDS. This resulted in the Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA.

Some background on RemoteFX & DDA

I’ve discussed the new capabilities in previous blog posts such as https://blog.workinghardinit.work/?s=DDA&submit=Search  and RemoteFX and vGPU Improvements in Windows Server 2016 Hyper-V. But here the Hyper-V Amigos talk about it for your benefit and enjoyment. I for one know we had a ton of fun. Microsoft only VDI solutions are really taking off both on-premises and in Azure in cost conscious environments that still need good performance. I think we’ll see an uptake of such deployments as Microsoft has made some decisions and added some features to make this more feasible.

Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA.

Click this link or the image below to watch Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA

image

There’s a bit of a learning curve associated with using DDA in Windows Server 2016. You’ll have to get acquainted with how to do it and put it to the test in labs and POCs. Do this before you even start thinking about designing production ready solutions. Having a good understanding on how it works and behaves is paramount to success.

Enjoy!