FREE WHITE PAPER: Configuring a VEEAM Off Host Backup Proxy Server for backing up a Windows Server 2012 R2 Hyper-V cluster with a DELL Compellent SAN (Fiber Channel)

Whilst I’m attending TechEd North America 2014, being able to learn and network again with the community at large I think this is a good moment to share. So here’s a little contribution to that community: it’s a white paper on How to configure a VEEAM Off Host Backup Proxy server for backing up a Windows Server 2012 R2 Hyper-V cluster with a DELL Compellent SAN (Fiber Channel).

VEEAM Back & Replication is currently under and extensive test before we make the decision. So far it is going (very) well. And no, VEEAM or DELL did not sponsor this. It’s sharing with the community. A prosperous, successful community makes my professional live better to!

I have to applaud VEEAM for allowing such easy access to their software for trials, to their engineers for assistance and to their support forum and resources even without yet being a paying customer. This is how it should be: vendors having faith in their products both in quality and ease of use. It’s a refreshing experience as some vendors don’t want you to get your hands on new versions of their products even as a existing paying customer “because due to its complexity we might get the wrong impression”. It’s even near impossible with some to get a test license for the lab of the version you currently use with some of them. Not so with VEEAM and that’s great.

I hope you enjoy it. As you might realize I don’t have this kind of infrastructure in my home lab so some of the screenshots have been edited / blurred. I’m sure you can live with that. Otherwise feel free to provide me with the gear in a paid for data center.

DELL Enterprise Forum EMEA 2014 in Frankfurt

As you might have noticed on Twitter I was in Frankfurt last week to attend DELL Enterprise Forum EMEA 2014. It was a great conference and very worthwhile going to. It was a week of multi way communication between vendor, marketing, engineering, partners and customers. I learned a lot. And I gave a lot of feedback. As a Dell TechCenter Rockstar and a Microsoft MVP in Hyper-V I can build bridges to make sure both worlds understand each other better and we, the customers get their needs served better.

Dell Enterprise Forum EMEA 2014 - Frankfurt

I’m happy I managed to go and I have some people to thank for me being able to grab this opportunity:

  • I cleared the time with my employer. This is great, this is a win win situation and I invested weekend time & extra hours for both my employer and myself.
  • I got an invite for the customer storage council where we learned a lot and got ample of opportunity to give honest and constructive feedback directly to the people that need to hear it! Awesome.
  • The DELL TechCenter Rockstar program invited me very generously to come over at zero cost for the Enterprise Forum. Which is great and helped my employer  and myself out. So, thank you so much for helping me attend. Does this color my judgment? 100%  pure objectivity does not exist but the ones who know me also know I communicate openly and directly. Look, I’ve never written positive reviews for money or kickbacks. I do not have sponsoring on my blog, even if that could help pay for conferences, travel expenses or lab equipment. Some say I should but for now I don’t. I speak my mind and I have been a long term DELL customer for some very good reasons. They deliver the best value for money with great support in a better way and model than others out there. I was sharing this info way before I became a Rockstar and they know that I tell the good, the bad and the ugly. They can handle it and know how to leverage feedback better than many out there.
  • Stijn Depril ( @sdepril, http://www.stijnsthoughts.be/), Technical Datacenter Sales at RealDolmen gave me a ride to Frankfurt and back home. Very nice of him and a big thank you for doing so.  He didn’t have to and I’m not a customer of them. Thank buddy, I appreciate it and it was interesting ton learn the partners view on things during the drive there and back. Techies will always be checking out gear …

Dell Enterprise Forum EMEA 2014 - Frankfurt

What did all this result in? Loads of discussion, learning and sharing about storage, networking, compute, cloud, futures and community in IT. It was an 18 hour per day technology fest in a very nice and well arranged fashion.

I was able to meet up with community members, twitter buddies, DELL Employees and peers from all over EMEA and share experiences, learn together, talk shop, provide feedback and left with a better understanding of the complexities and realities they deal with on their side.

Dell Enterprise Forum EMEA 2014 - Frankfurt

It has been time very well spent. I applaud DELL to make their engineers and product managers available for this event. I thank them for allowing us this amount of access to their brains from breakfast till the moment we say goodnight after a night cap. Well done, thank you for listening and I hope to continue the discussion. It’s great to be a DELL TechCenter Rockstar and work in this industry during this interesting times. To all the people I met again or for the first time, it was a great week of many interesting conversations!

For some more pictures and movies visit the Dell Enterprise Forum EMEA 2014 from Germany photo album on Flickr

Where Does Storage QoS Live In Windows Server 2012 R2 Hyper-V

Back to basics to explain where storage QoS lives and how it works

In Windows Server 2012 R2 Hyper-V (and earlier) we have Hyper-V components called Virtualization Service Provider (VSP) and Virtualization Service Clients (VSC). In combination with the VMBUS the VSP and VSC components are what make virtualization perform well on Hyper-V.The Stor VSP/VSC are were the maximum IOPS functionality lives, aka as QoS Limit.

In a hosted hypervisor like Virtual PC or in a bare metal hypervisor without any “enlightment” the operating system inside a virtual machine is blissfully unaware of the fact it virtualized. Basically it sends hardware access requests using native drivers, but the requests are received by the virtual layer that intercepts them on behalf of the host OS by emulating hardware devices. This comes at a cost, namely performance, latency and losing device specific functionality.

In Hyper-V Microsoft provides the Integration Services (IS) for virtual machines running on Hyper-V which, in combination with the VMBus, avoids this overhead. So you should ways use them where and when possible. Two of the components in the IS are VSP and VSC. They are responsible for the communication between the Host OS or Parent Partition (where the VSP lives) and the Guest OS or Child Partition (where the VSC lives).

image

There are 4 VSP & VSC components: Network, Video, HID and Storage. As you probably guessed we’re interested in the storage VSP & VSC (storVSP.sys & storvsc.sys) for the discussion at hand. While the Stor VSP lives in the host OS and the Stor VSC in the guest OS of every VM running on the host they communicate over the VMBus we mentioned and is designed to make communications as fast as possible (it’s a communication protocol that runs in memory, i.e. it’s very fast).

image

The Minimum IOPS, also known as the Reserve is set per virtual disk but the threshold alerts for it are generated by the VHDMP. This is the VHD/VHDX parser and dependency property provider and this know all about the VHD/VHDX format with in itself is again a file on storage (DAS, CSV, SMB 3.0 File Share). This also happens to be where the Storage IO Balancer lives with which it collaborates, more on that below. You now see why QoS is not available for pass-through disk or iSCSI/FC storage in a VM, it requires a VHDX and is implemented at the virtual disk layer.

The QoS Limit (Maximum IOPS) is set at the virtual disk level via the Stor VSC and the Qos Limiter lives in the Stor VSP.

image

So what do we know:

QoS Limit (Maximum IOPS) and QoS reserve (Minimum IOPS) are implemented at the virtual disk layer. So per VHDX in a particular VM.  It’s not available yet for shared VHDX, whether on the same host or not.

Unlike QoS Limit (Maximum IOPS), which is a hard cap, QoS reserve (Minimum IOPS) is a best effort not a hard minimum. It’s used to warn us, not as an enforcement. This works at the host level, where it will detect whether the VHDX can get get the minimum IOPS configured or not and can generate alerts if this happens. This tied to the QoS IO Balancer which is improved in R2 but it will still only spreads IOPS across multiple VMs on the same host, making sure they all get a fair share.

The key point here is that this process doesn’t work across multiple hosts in a cluster, over multiple clusters and stand alone member servers that might all be attached to the same storage system. Meaning that on shared, multi purpose storage we might have an issue. What if some VMs in a dedicated 4 node Hyper-V cluster dedicated to SQL Server virtualization is eating away all the IOPS. QoS IO Balancer will give each SQL Server VM a fair share of the IOPS but only within its host in that cluster. But if a VM on another host is consuming all IOPS, that’s out of it’s scope  That’s where the max cap comes to the rescue (at the virtual disk level) if you need it. Nice but not perfect. You can see now why the storage QoS minimum is implemented at the VHDMP layer, as this which is where the IO Balancer also lives. The fairness that the IO Balancer gives you a better change that the minimal reserve might be met and if it doesn’t you’ll get notified (you need to listen an react, I hope that’s obvious).

Also don’t forget that if you still have other physical servers that run file services, SQL Server or some data crunching apps you will find that those are blissfully ignorant of your QoS IO Balancer at the Hyper-V host level and of your QoS at the Hyper-V virtual disk level.

There is no multiple host QoS, there is no cluster wide QoS and there is no storage wide QoS in Windows. Perhaps you have some QoS your SAN but most of the time this has no knowledge of Hyper-V, the cluster and the virtual machines.

So the above this gives you an idea where does Microsoft might focus it’s attention in regards to storage IOPS  management (there are many more storage capabilities on my wish list) in vNext.

Any other options available today?

Other options are storage that is smart and has knowledge about the workload. This is nice but that means that it will come at a cost. For the moment GridStore with it’s virtual controller seems to be one of the better ones out there. Now I have heard people say Microsoft doesn’t get it and they’re doing do a bad job, but I do not agree. I have spoken to many people in the community and at MSFT and they have stated, even publicly, on stage, that they will keep investing in storage feature to enhance it in the versions to come. Take a look here at TechEd 2013 Session  MDC-B345: Windows Server 2012 Hyper-V Storage Performance.

Why would I like Microsoft to keep improving storage

When talking to storage vendors serving our needs, I always have some feedback. A lot of the advanced storage features don’t always work well in real life, especially if you combine a few. Don’t believe me? Talk to some experienced Windows engineers about the sorry state of many hardware VSS providers. Or how federation across storage systems falls apart the moment you combine it with application consistent snapshots or put a real heavy load on it. Not to cool when you paid for all those licenses which are tuned into “lab only” toys. Yes sometimes as a Windows user you feel like a second class citizen in storage land. A lot of storage systems are still very much a silo. Attempts to do storage federation without a hit on performance, making it load balance across SAN building blocks whilst making all the advanced features that have knowledge of the OS and hypervisor work reliably are not moving as fast as the race for ever more IOPS.

Sure I love the notion of 2 million IOPS, especially if you can get them with random write/read IO at super low latencies Smile. But there are other, sometimes more urgent needs and those seem to fall between the cracks as the storage vendors compete with each other and forget about the needs of their customers. If some storage vendors would shut up long enough to listen to customers they might be less surprised as to why those customers are interested in Storage Spaces.

So it would be kind of nice if Microsoft can work on this an include more evolved storage QoS capabilities in the box. I also like that approach for other reasons. Basically we will do everything we can with what Windows offers us inbox. It’s cost effective as long as you keep the KISS principle in mind and design it consciously. I assure you that often too much money is spent on 3rd party software because people don’t leverage what they have in box and drop the 20/80 rule. We do and you get the best TCO/ROI for our licenses possible. We don’t spend extra money on licenses, integration and support of third party products so we can spend it where it matters the most. It also makes upgrades easier as the complexity and the number of dependencies are lower on pure in box solution.On top of that we minimalize the distinct possibility that one or more 3rd party products will hold us hostage in an older infrastructure because they don’t support new versions of Windows fast, good and complete enough for us to upgrade.

How To Monitor Storage QoS Minimum IOPS & Identify VM & The Virtual Hard Disk In Windows Server 2012 R2 Hyper-V

At TechEd 2013 John Matthew & Liang Yang presented following session  MDC-B345: Windows Server 2012 Hyper-V Storage Performance. That was a great one. During this they demonstrated the use of WMI to monitor storage alerts related to Storage QoS in Windows Server 2012 R2. We’re going to go further, dive in a bit deeper and show you how to identify the virtual hard disk and the virtual machine.

One important thing in all this is that we need to have the reserve or minimum IOPS not being met, so we run IOMeter to make sure that’s the case. That way the events we need will be generated. It’s a bit of a tedious exercise.

So we start with a wmi notification query, this demonstrates that notifications are sent when the minimum IOPS cannot be met. The query is simply:

select * from Msvm_StorageAlert

image

instance of Msvm_StorageAlert
{
    AlertingManagedElement = "\\TESTHOST01\root\virtualization\v2:Msvm_ResourcePool.InstanceID="Microsoft:70BB60D2-A9D3-46AA-B654-3DE53004B4F8"";
    AlertType = 3;
    Description = "Hyper-V Storage Alert";
    EventTime = "20140109114302.000000-000";
    IndicationTime = "20140109114302.000000-000";
    Message = "The ‘Primordial’ Hard Disk Image pool has degraded Quality of Service. One or more Virtual Hard Disks allocated from the pool is not reporting sufficient throughput as specified by the IOPSReservation property in its Resource Allocation Setting Data.";
    MessageArguments = {"Primordial"};
    MessageID = "32930";
    OwningEntity = "Microsoft-Windows-Hyper-V-VMMS";
    PerceivedSeverity = 3;
    ProbableCause = 50;
    ProbableCauseDescription = "One or more VHDs allocated from the pool (identified by value of AlertingManagedElement property) is experiencing insufficient throughput and is not able to meet its configured IOPSReservation.";
    SystemCreationClassName = "Msvm_ComputerSystem";
    SystemName = "TESTHOST01";
    TIME_CREATED = "130337413826727692";
};

That’s great, but what virtual hard disk of what VM is causing this? That’s the question we’ll dive into in this blog. Let’s go. On MSDN docs on Msvm_StorageAlert class we read:

Remarks

The Hyper-V WMI provider won’t raise events for individual virtual disks to avoid flooding clients with events in case of large scale malfunctions of the underlying storage systems.

When a client receives an Msvm_StorageAlert event, if the value of the ProbableCause property is 50 (“Storage Capacity Problem“), the client can discover which virtual disks are operating outside their QoS policy by using one of these procedures:

Query all the Msvm_LogicalDisk instances that were allocated from the resource pool for which the event was generated. These Msvm_LogicalDisk instances are associated to the resource pool via the Msvm_ElementAllocatedFromPool association.
Filter the result list by selecting instances whose OperationalStatus contains “Insufficient Throughput”.

So I query  (NOT a notification query!) the Msvm_ElementAllocatedFromPool class, click through on a result and select Show MOF.

image

image

 

Let’s look at that MOF …In yellow is the GUID of our VM ID. Hey cool!

instance of Msvm_ElementAllocatedFromPool
{
    Antecedent = "\\TESTHOST01\root\virtualization\v2:Msvm_ProcessorPool.InstanceID="Microsoft:B637F347-6A0E-4DEC-AF52-BD70CB80A21D"";
    Dependent = "\\TESTHOST01\root\virtualization\v2:Msvm_Processor.CreationClassName="Msvm_Processor",DeviceID="Microsoft:b637f346-6a0e-4dec-af52-bd70cb80a21d\\6",SystemCreationClassName="Msvm_ComputerSystem",SystemName="
96CD7F7E-0C0A-42FE-96CB-B5550D937F27"";
};

Now we want to find the virtual hard disk in question! So let’s do what the docs says and query Msvn_LogicalDisk based on the VM GUID we find the relates results …

image

Look we got OperationalStatus 32788 which means InsufficientThroughput, cool we’re on the right track … now we need to find what virtual disk of our VM  that is. Well in the above MOF we find the device ID:     DeviceID = "Microsoft:5F6D764F-1BD4-4C5D-B473-32A974FB1CA2\\L"

Well if we then do a query for Msvm_StorageAllocationSettingData we find two entrties for our VM GUID (it has two disks) and by looking at the value InstanceID that contains the above DeviceID we find the virtual hard disk info we needed to identify the one not getting the minimum IOPS.

image

HostResource = {"C:\ClusterStorage\Volume5\DidierTest01\Virtual Hard Disks\DidierTest01Disk02.vhdx"};
HostResourceBlockSize = NULL;
InstanceID = "Microsoft:96CD7F7E-0C0A-42FE-96CB-B5550D937F27\5F6D764F-1BD4-4C5D-B473-32A974FB1CA2\\L";

Are you tired yet? Do you realize you need to do this while the disk IOPS is not being met to see the events. This is no way to it in production. Not on a dozen servers, let alone on a couple of hundred to thousands or more hosts is it? All the above did was give us some insight on where and how. But using wbemtest.exe to diver deeper into wmi notifications/events isn’t really handy in real life. Tools will need to be developed to deal with this larger deployments. The can be provided by your storage vendor, your VAR, integrator or by yourself if you’re a large enough shop to make private cloud viable or if you are the cloud provider Smile.

To give you an idea on how this can be done there is some demo code on MSDN over here and I have that compiled for demo purposes.

We have 4 VMs running on the host.  One of them is being hammered by IOMeter while it’s minimum IOPS have been set to an number it cannot possibly get. We launch StorageQos to monitor our Hyper-V host.

image

Just let it run and when the notification event that the minimum IOPS cannot be delivered on the storage this monitor will query WMI further to tell us what virtual disk or disks are  involved.  Cool huh! A good naming convention can help identify the VM and this tools works remotely against node so you can launch one for each node of the cluster. Time to fire up Visual Studio 2013 me thinks or go and chat to a good dev you might know to take this somewhere, some prefer this sort of work to the modern day version of CRUD apps any day. If you buy monitoring tools you might want them to have this capability.

While this is just demo code, it gives you an idea of how tools and solutions can be developed & build to monitor the Minimum IOPS part of Storage QoS in Windows Server 2012 R2. Hope you found this useful!