SOFS / SMB 3 Offers Best VM Resiliency Experience

I have blogged about Virtual Machine Resiliency in Windows 2016 Failover Clustering before in Testing Virtual Machine Compute Resiliency in Windows Server 2016 

Those test and demos were done with block lever storage, CSV on Fibre Channel, iSCSI or shared SAS. Today we’ll look at the experience when you’re running your VMs on a continually available file share on a Scale Out File Server (SOFS). This configuration offers the best possible experience.

Why well, when the cluster node is in Isolated mode this has no impact on the SOFS share as this is a resource external to the Hyper-V cluster. In other words it remains on line. This means that the VMs, even if they have lost their high availability during the time the node is Isolated, they keep running. After all there is nothing wrong with Hyper-V itself. With block level CSV storage you lose access to the storage as that a cluster resource and the node got isolated. That’s why the VMs go into a paused critical state during a transient failure with block level storage but they don’t when you’re using SOFS.

image

The virtual machine compute resiliency feature in action shows you that the VMs service a transient failure without issues. Your services need never know something was up. Even when the transient failure is reoccurring that doesn’t mean it will cause down time. The node will be quarantined and if it come backup the workload will be live migrated away.

image

You can watch a video of this in action here on Vimeo:

The quarantine threshold and duration as well as the resiliency period and can be tweaked to your environment to get the best possible results.

image

SMB 3 for the win! This is yet one more convincing argument to start looking into SOFS and leveraging the capabilities of SMB3. Remember that you can run as SOFS cluster against your existing shared storage to get started if you can get the IOPS/latency you require. But also look into storage spaces, especially storage spaces direct which avoids some of the drawback SANs have in such a scenario. High time for storage vendors to really scale out, implement SMB 3 well and complete and keep the great added value features they already have in their offering. It’s this or becoming yet a bit more irrelevant in todays storage scene in the Microsoft ecosystem.

Simplified SMB Multichannel and Multi-NIC Cluster Networks

Simplified SMB Multichannel and Multi-NIC Cluster Networks

One of the seemingly small feature enhancements in Windows Server 2016 Failover clustering is simplified SMB multichannel and multi-NIC cluster networks. In Windows 2016 failover clustering now recognizes and uses multiple NICs on the same subnet for cluster networking (Cluster & client access).

image

Why was this introduced?

The growth in the capabilities of the hardware ( Compute, memory, storage & networking) meant that failover clustering had to leverage this capability more easily and for more use cases than before. Talking about SMB, that now also is used for not “only” CSV and live migration but also for Storage Spaces Direct and Storage Replica.

  • It gives us better utilization of the network capabilities and throughput with Storage Spaces Direct, CSV, SQL, Storage Replica etc.
  • Failover clustering now works with multichannel as any other workload without the extra requirement of needing multiple subnets. This is more important that it seems to me at first. But in many environment getting another VLAN and/or extra subnet is a hurdle. Well that hurdle has gone.
  • For IPv6 Link local Subnets it just works, these are auto configured as cluster only networks.
  • The cluster Validation wizard won’t nag about it anymore and knows it’s a valid failover cluster configuration

See it in action!

You can find a quick demo of simplified SMB multichannel and multi-NIC cluster networks on my Vimeo channel here

image

In this video I demo 2 features. One is new and that is virtual machine compute resiliency. The other is an improved feature, simplified SMB multichannel and multi NIC cluster networks. The Multichannel demo is the first part of the video. Yes, it’s with RDMA RoCEv2, you know I just have to do SMB Direct when I can!

You can read more about simplified SMB multichannel and multi-NIC cluster networks on TechNet in here. Happy Reading!

Shared VHDX In Windows 2016: VHDS and the backing storage file

Introduction into the VHD Set

I have talked about the VHD Set with a VHDS file and a AVHDX backing storage file in Windows Server 2016 in a previous blog post A first look at shared virtual disks in Windows Server 2016. One of the questions I saw pass by a couple of times is whether this is still a “normal VHDX” or a new type of virtual disk. Well the VHDS files is northing but a small file containing some metadata to coordinate disk actions amongst the guest cluster nodes accessing the shared virtual disk. The avhdx file associated with that VHDS file is an automatically managed dynamically expanding or fixed virtual disk. How do I know this? Well I tested it.

There is nothing that preventing you from copying or moving the avhdx file of a VHD Set that not in use. You can rename the extension from avhdx to vhdx. You can attach it to another VM or mount it in the host and get to the data. In essence this is a vhdx file. The “a” in avhdx stands for automatic. The meaning of this is that an vhdx is under control of the hypervisor and you’re not supposed to be manipulating it but let the hypervisor handle this for you. But as you can see for yourself if you try the above you can get to the data if that’s the only option left. Normally you should just leave it alone. It does however serve as proof that the VHD Set uses an standard virtuak disk (VHDX) file.

I’ll demonstrate this with an example below.

Fun with a backing storage file in a VHD Set

Shut down all the nodes of the guest cluster so that the VHD Set files are not in use. We then rename the virtual disk’s extension avhdx to vhdx.

image

You can then mount it on the host.

image

And after mounting the VHDX we can see the content of the virtual disk we put there when it was a CSV in that guest cluster.

image

We add some files while this vhdx is mounted on the host

image

Rename the virtual disk back to a avhdx extension.

image

We boot the nodes of the guest cluster and have a look at the data on the CSV. Bingo!

image

I’m NOT advocating you do this as a standard operation procedure. This is a demo to show you that the backing storage files are normal VHDX files that are managed by the hypervisor and as such get the avhdx extension (automatic vhdx) to indicate that you should not manipulate it under normal circumstances. But in a pinch, it a normal virtual disk so you can get to it with all options and tools at your disposal if needed.

Maximum bandwidth in Hyper-V storage QoS policies

Introduction

In a previous blog post Hyper-V Storage QoS in Windows Server 2016 Works on SOFS and on LUNs/CSV I have discussed Storage QoS Policies in Windows Server 2016. I have also demonstrated this in a lab setup at VEEAMON 2015 in one of my talks at the Microsoft presentation area. It’s one of those features where a home lab will do the job. There is no need for special storage hardware. It’s all in box functionality. Cool!

Maximum bandwidth in Hyper-V storage QoS policies

Now that was in the Technical Preview 2 and 3 era, where it all revolved around minimum and maximum QoS. In Windows Server 2016 Technical Preview 4 we got some new features in regards to storage QoS policies. One of those is that we can now also set the Maximum bandwidth on a policy using the parameter MaximumIOBandwidth. This parameter, which is set in bytes per second determines the maximum bandwidth that any flow assigned to the policy is allowed to consume.

image

We use that policy ID to assign it to the 2 shared virtual disks of our cluster nodes. You’ll need to do this for all of the guest cluster nodes.image

You can copy the PoSh demo script below


#Create a Storage Policies
$DemoVMPolicy = New-StorageQosPolicy -Name DemoVMPolicy -PolicyType MultiInstance `
-MinimumIops 250 -MaximumIops 500 -MaximumIOBandwidth 100MB

#Look at our storage Policies
Get-StorageQosPolicy -name DemoVMPolicy

#Grab our policy ID
$DemoVMPolicy = (get-StorageQosPolicy -Name DemoVMPolicy).PolicyId 
$DemoVMPolicy 


#Look at our VMs policy setting before and after assigning a storage policy.
#We assign the storage policy to the 2 shared virtual disks
#that are located a location 1 and 2 on SCSI controller 0

Get-VM -Name GuestClusterNode1 | Get-VMHardDiskDrive |
ft Path,MinimumIOPS, MaximumIOPS, MaximumIOBandwidth, QoSPolicyID -AutoSize

Get-VM -Name GuestClusterNode1 | Get-VMHardDiskDrive | Where-Object {$_.controllerlocation -ge 1}|
Set-VMHardDiskDrive  -QoSPolicyID $DemoVMPolicy

Get-VM -Name GuestClusterNode1 | Get-VMHardDiskDrive | 
ft Path, MinimumIOPS, MaximumIOPS, MaximumIOBandwidth, QoSPolicyID -AutoSize

You can use MaximumIOBandwidth by itself or you can combine it with the maximum IOPS setting. When both of these parameter are set in a storage QoS policy they are both active. The one that is reached first by a flow assigned to this policy will be the limiting factor in the I/O of that flow.

As an example. Let’s say you specify 500 IOPS and 100Mbps bandwidth as maxima. Your workload hits 500 IOPS but only consumes 58 Mbps it’s the IOPS that are limiting the flow.