Hyper-V Storage QoS in Windows Server 2016 Works on SOFS and on LUNs/CSVs

Introduction

I addressed storage QoS in Windows Server 2012 R2 at length in a coupe of blog posts quite a while ago:

I love the capability and I use it in real life. I also discussed where we were still lacking features and capabilities. I address the fact that there is no multiple host QoS, there is no cluster wide QoS and there is no storage wide QoS in Windows. On top of that, if there is QoS in the storage array (not many have that) most of the time this has no knowledge of Hyper-V, the cluster and the virtual machines. There is one well know exception and that GridStore, possible the only storage vendor that doesn’t treat Hyper-V as a second class citizen.

Any decent storage QoS that not only provides maximums but also minimums, does this via policies and is cluster and hypervisor even virtual machine aware. It needs to be easy to implement and mange. This is not a very common feature. And if it’s exists it’s is tied to the storage vendor, most of the time a startup or challenger.

Windows Server 2016

In Windows Server 2016 they are taking a giant step for all mankind in addressing these issues. At least in my humble opinion. You can read more here:

Basically  Microsoft enables us to define IOPs management policies for virtual machines based on virtual hard disks and  IOPs reserves and limits. These are shared by a group of virtual machines / virtual hard disks.  We can have better resource allocation between VMs, or groups of VMs. These could be high priority VMs or VMs belonging to an platinum customer /tenant. Storage QoS enhances what we already have since Windows Server 2012 R2.  It enables us to monitor and enforce performance thresholds via policies on groups of VMs or individual VMs.

Great for SLA’s but also to make sure a run away VM that’s doing way to much IO doesn’t negatively impact the other VMs and customers on the cluster. They did this via via a Centralized Policy Controller. Microsoft Research really delivered here I would dare say. A a public cloud provider they must have invested a lot in this capability.

At Ignite 2015 there was a great session by Senthil Rajaram and Jose Barreto on this subject. Watch it for some more details.

What caught my eye after  attending and watching sessions, talking to MSFT at the boot was the following marked in red.

image

So not enabled by default on non SOFS storage but can you enable it on your block level CSV Hyper-V cluster? There is a lot of focus on Microsoft providing Storage QoS for SOFS. Which ties into the “common knowledge” that virtualization and LUNs are a bad idea, you need file share and insights into the files of the virtual machines to put intelligence into the hypervisor or storage system right? Well perhaps no! I Windows Server 2016 there is now also the ability to provide it to any block level storage you use for Hyper-V. Yes your low end iSCSI SAN or your high End 16Gbps FC SAN … as long as it’s leveraging CVS (and you should!). Yes, this is what they state in an awesome interview with my Fellow Hyper-V MVP Carsten Rachfahl at Ignite 2015.

Videointerview with Jose and Senthil Storage QoS Thumb2

Senthil and Jose look happy and proud. They should be.  I’m happy and proud of them actually as to me this is huge. This information is also in the TechNet guide Storage Quality of Service in Windows Server Technical Preview

Storage QoS supports two deployment scenarios:

Hyper-V using a Scale-Out File Server This scenario requires both of the following:

  • Storage cluster that is a Scale-Out File Server clusterCompute cluster that has least one server with the Hyper-V role enabled.
  • For Storage QoS, the Failover Cluster is required on Storage, but optional on Compute. All servers (used for both Storage and Compute) must be running Windows Server Technical Preview.

Hyper-V using Cluster Shared Volumes. This scenario requires both of the following:

  • Compute cluster with the Hyper-V role enabled
  • Hyper-V using Cluster Shared Volumes (CSV) for storage

Failover Cluster is required. All servers must be running Windows Server Technical Preview.

So let’s have a quick go following the TechNet guide on a lab cluster leveraging CSV over FC with a Dell Compellent.image

Which give me running Storage QoS Resource

image

And I can play with my new PoSh Commands … Get-StorageFlowQos, Get-StorageQosPolicy and Get-StorageQosVolume …

image

The guide is full of commands, examples and tips. Go play with it. It’s great stuff Smile. I’ll blog more as I experiment.

Here’s my test VMs doing absolutely nothing, bar one on which I’m generating traffic. Even without a policy set it shows the IOPS the VM is responsible for on the storage node.image

Yu can dive into this command and get details about what virtual disk on what volume are contributing to the this per storage node.

image

More later no doubt but here I just wanted to share this as to me this is very important! You can have the cookie of your choice and eat it to! So the storage can be:

  1. SOFS provided (with PCI RAID, Shared SAS, FC, FCoE, ISCI storage as backend storage) that doesn’t matter. In this case Hyper-V nodes can be clustered or stand alone
  2. The storage can be any other block level storage: iSCSI/FC/FCoE it doesn’t matter as long as you use CSVs. So yes this is clustered only. That Storage QoS Resource has to run somewhere.

You know that saying that you can’t do storage QoS on a LUN as they can’t be tweaked to the individual VM and virtual hard disks? Well, that’s been busted as myth it seems.

What’s left? Well if you have SOFS against a SAN or block level storage you cannot know if the storage is being used for other workloads that are not Hyper-V, policies are not cross cluster and stand alone hosts are a no go without SOFS. The cluster is a requirement for this to work with non SOFS Hyper-V deployments.  Also this has no deep knowledge or what’s happening inside of your storage array. So it knows how much IOPS you get, but it’s actually unaware of the total IOPS capability of the entire storage system or controller congestion etc. Is that a big show stopper? No. The focus here is on QoS for virtualization. The storage arrays storage behavior is always in flux anyway. It’s unpredictable by nature. Storage QoS is dynamic and it looks pretty darn promising to me! People this is just great. Really great and it’s very unique as far as I can say. Microsoft, you guys rock.

Testing Virtual Machine Compute Resiliency in Windows Server 2016

No matter what high quality gear you use, how well you design your environment and how much redundancy you build in you will see transient failures in your environment at one point in time. In combination with the push to ever more commodity hardware and the increased use of converged deployments leveraging Ethernet transient failures have become more frequent occurrence then they used to be.

Failover clustering by tradition reacts very “assertive” to failures in order to provide high to continuous availability to our virtual machines. That’s great, we want it to do that, but this binary approach comes at a cost under certain conditions. When reacting too fast and too proactively to transient failures we actually can get  less high or continuous availability in certain scenarios than if the cluster would just have evaluated the situation a bit more cautiously. It’s for this reason that Microsoft introduced increased “Virtual Machine Compute Resiliency” to deal with intra-cluster communication failures in a Windows Server 2016 cluster.

I have helped out a number of fellow MVPs over the past 6 months with this new feature and I dove back into my lab notes to blog about this and help you out with your own testing. The early work was done with Technical Preview v1. In that release it was disabled by default (the value for cluster property “ResiliencyDefaultPeriod”  was set to 0) and the keyword “Default” was used in cluster property “resiliencylevel” for the what is now called ‘IsolateOnSpecialHeartbeat’ and is no longer the default at installation. If that doesn’t confuse you yet, I’ll find another reason to tell you to move to technical preview v2. In TPv2 Virtual Machine Compute Resiliency is enabled and configured by default but in TPv1 you had to enable and configure it yourself. I  advise you to stop testing with v1 and move to v2 and future technical preview release in order for you to test with the most recent bits and functionality.

Investigating the feature configuration

When testing new features in Windows Server Technical Preview Hyper-V you’re on your own once in a while as much is not documented yet. Playing around with PowerShell helps you discover stuff. A  Get-Cluster  | fl * teaches us all kinds of cool stuff such as these new cluster properties:

ResiliencyDefaultPeriod
QuarantineDuration
ResiliencyLevel

Here’s a screenshot of Windows Server 2016TPv1 (Please stop using this version and move to TPv2!)

image

Now when you’re running Windows Server 2016TP v2 this feature has been enabled by default (ResilienceyDefaultPeriod has been filled out as well as QuarantineDuration) and the resiliency level has been set to “AlwaysIsolate”.

image

After some lab work with this I figured out what we need to know to make VM Compute Resiliency to work in our labs:

  • Make sure your cluster functional level is running at version 9
  • Make sure your VMs are at version 6.X
  • Make sure the Operating systems of the VM is Windows Server Technical Preview v2 (Again move away from TPv1)
  • Enable Isolation/Quarantine via PowerShell:

(get-cluster).resiliencylevel
(get-cluster).resiliencylevel = ‘AlwaysIsolate’ or 2
(get-cluster).resiliencylevel
(get-cluster).resiliencylevel = ‘IsolateOnSpecialHeartbeat’  or  1
(get-cluster).resiliencylevel

Please note that all nodes need to be on line to make this change in the technical preview. I got the two accepted values by trial and error and the blog by Subhasish Bhattacharya confirms these are the only 2 ones.

  • Set the timings to some not too high and not too low value to play in the lab without having to wait to long before it’s back to normal (the values I use in my current Technical Preview lab environment are not a recommendation whatsoever, they only facilitate my testing and learning, this has nothing to do with any production environment) . For lab testing I chose:

(get-cluster).ResiliencyDefaultPeriod = 60  Note that setting this to 0 reverts you back to pre Windows Server 2016 behavior and actually disables this feature. The default is 240 seconds

(get-cluster).QuarantineDuration = 300 The default is 7200 seconds, but I’m way to impatient in my lab for that so I set the quarantine duration lower as I want to see the results of my experiments fast, but beware of just messing with this duration in production without thinking about it. Just saying!

Testing the feature and its behaviour

Then you’re ready to start abusing your cluster to demo Isolation mode & quarantine. I basically crash the Cluster service on one of the nodes in the cluster.  Note that cleanly stopping the service is not good enough, it will nicely drain that node for you. which is not what we want to see. Crash it of force stop it via stop-process -name clussvc –Force.

So what do we see happen:

    • The node on which we crashed the cluster server experiences a “transient” intra-cluster communication failure. This node is placed into an Isolated state and removed from its active cluster membership.

image

  • The VMs running at version 6.2 go into Unmonitored state. The other ones just fail over. Unmonitored means you that the cluster is no longer actively managing the VM but you can still look at the condition of the VM via PowerShell or Hyper-V manager. image

image

image

Based on the type of storage you’re using for your VMs the story is different:

  1. File Storage backed (SMB3/SOFS): The VM continues to run in the Online state. This is possible because the SMB share itself has no dependency on the Hyper-V cluster. Pretty cool!
  2. Block Storage backed (FC / FCoE / iSCSI / Shared SAS / PCI RAID)): The VMs go to Running-Critical and then placed in the Paused Critical state. As you have a intra-cluster communication failure (in our case losing the cluster service) the isolated node no longer has access to the Cluster Shared Volumes in the cluster and this is the only option there is.

image

  • If the isolated node doesn’t recover from this presumed transient failure it will, after the time specified in ResiliencyDefaultPeriod (default of 4 minutes : 240 s) go into a down state. The VMs fail over to another node in the cluster. Normally during this experiment the cluster service will come back on line automatically.
  • If a node, does recover but goes into isolated 3 times within 1 hour, it is placed into a Quarantine state for the time specified in QuarantineDuration (default two hours or 7200 s) . The VMS running on this node are drained to another node in the cluster. So if you crash that service repeatedly (3 times within an hour) the Hyper-V Node will go into  “Quarantine” status for the time specified (in our lab 5 minutes as we set it to 300 s). The VMs will be live migrated off even if the node is up and running when the cluster service comes up again.

You might notice that this screenshot is a different lab cluster. Yes, it’s a TPv1 cluster as for some reason the Live Migration part on Quarantine is broken on my TPv2 lab. It’s a clean install, completely green field. Probably a bug.image

It’s the frequency of failures that determines that the node goes into quarantine for the amount of time specified. That’s a clear sign for you to investigate and make sure things are OK. The node is no longer allowed to join the cluster for a fixed time period (default: 2 hours)­. The reason for this is to prevent “flapping nodes” from negatively impacting other nodes and the overall cluster health. There is also a fixed (not configurable as far as I know) amount if nodes that can be quarantined at any give time: 20% or only one node can be quarantined (whatever comes first, in the case of a 2, 3 or for node cluster it’s one node max that can be in quarantine).

If you want to get a quarantined node out of quarantine immediately you can rejoin it to the cluster via a single PowerShell command: Start-ClusterNode –CQ  (CQ = Clear Quarantine). Handy in the lab or in real live when things have been fixed and you want that node back in action asap.

Conclusion

Now this sounds pretty good doesn’t it? And it is. Especially if you’re running you’re running your VMs on a SOFS share. Then the VMs will remain online during the Isolation / Unmonitored phase but when you have “traditional” block level storage they won’t. They’ll go in mode as the in that design you have lost access to the CSV. Now, if you ever needed yet another reason to move to a Scale Out File Server & SMB 3 to deliver storage for your VMs I have just given you one! Hey storage vendors … how is that full SMB 3 feature stack coming on your storage arrays? Or do you really just want us to abstract you away behind a Windows SOFS cluster?

Subhasish Bhattacharya Has blogged about this as well here. It’s a feature we’ll test at length to get a grip on the behavior so we know how the cluster nodes will behave under certain conditions. Trust, but verify is my mantra and it’s way better to figure out how a feature behaves in the lab than having to figure it out when you see it for the very first time in production based on assumptions. Just saying.

E2EVC 2015 Berlin SMB Direct Slide Deck

I attended and presented at E2EVC 2015 in Berlin from June 12th to June 14th. The networking was a blast. No “marchitecure” bull shit or vendor fairy tales what so ever and lots of very open discussions on the realities we’re seeing and facing in virtualization and cloud. Most account managers and esoteric presales would die a painful (but fast) death in this environment.

image

One session was with my Hyper-V Amigo buddy Carsten Rachfahl and was pure demo extravaganza, so no slides. My own session was “SMB Direct – The Secret Decoder Ring” and was an attempt to position this technology what by looking at the why and where followed by the how by who and when.

image

I hope a lot of people had at least a better understanding of SMB Direct, RDMA and DCB. The second aim was to take away the fear many people have of this tech by showcasing it in short demos. Time constraints where a challenge so it was not a 200 level session.

Please download the presentation here if interested.

Enjoy. If you have any concerns or questions, ask, and I’ll try to answer.

Presenting at ITProceed 2015 & E2EVC 2015 Berlin on SMB Direct

You cannot afford to ignore SMB3 and it’s capabilities related to storage traffic such as multichannel, RDMA and encryption. SMB Direct over RoCE seems to have a bright future as it continuous to evolve and improve in Windows Server 2106. The need for DCB (PFC and optionally ETS) intimidates some people. But it should not.

I’ll be putting SMB Direct & RoCE into perspective at ITPROCEED | Welcome to THE IT Pro Event of the year! and #E2EVC E2EVC 2015 Berlin, June 12-14, 2015 Berlin, Germany, sharing experiences, tips and demos!  Come see PFC & ETS in action and learn what it can do for you. Storage vendors should most certainly consider supporting all features of SMB 3 natively as a competitive advantage. So Join me for the talk “SMB Direct – The Secret Decoder Ring”.

All these talks are at extremely affordable community driven events to make sure you can attend. The sessions are given by speakers who do this for the community (speakers and attendees do this in their own time and pay for their our own travel/expenses) and who work with these technology in real life and provide feedback to vendors on the issues or opportunities we see. This makes the sessions very interesting and anything but marketing, slide ware or sales pitches. See you there!