How To Monitor Storage QoS Minimum IOPS & Identify VM & The Virtual Hard Disk In Windows Server 2012 R2 Hyper-V

At TechEd 2013 John Matthew & Liang Yang presented following session  MDC-B345: Windows Server 2012 Hyper-V Storage Performance. That was a great one. During this they demonstrated the use of WMI to monitor storage alerts related to Storage QoS in Windows Server 2012 R2. We’re going to go further, dive in a bit deeper and show you how to identify the virtual hard disk and the virtual machine.

One important thing in all this is that we need to have the reserve or minimum IOPS not being met, so we run IOMeter to make sure that’s the case. That way the events we need will be generated. It’s a bit of a tedious exercise.

So we start with a wmi notification query, this demonstrates that notifications are sent when the minimum IOPS cannot be met. The query is simply:

select * from Msvm_StorageAlert

image

instance of Msvm_StorageAlert
{
    AlertingManagedElement = "\\TESTHOST01\root\virtualization\v2:Msvm_ResourcePool.InstanceID="Microsoft:70BB60D2-A9D3-46AA-B654-3DE53004B4F8"";
    AlertType = 3;
    Description = "Hyper-V Storage Alert";
    EventTime = "20140109114302.000000-000";
    IndicationTime = "20140109114302.000000-000";
    Message = "The ‘Primordial’ Hard Disk Image pool has degraded Quality of Service. One or more Virtual Hard Disks allocated from the pool is not reporting sufficient throughput as specified by the IOPSReservation property in its Resource Allocation Setting Data.";
    MessageArguments = {"Primordial"};
    MessageID = "32930";
    OwningEntity = "Microsoft-Windows-Hyper-V-VMMS";
    PerceivedSeverity = 3;
    ProbableCause = 50;
    ProbableCauseDescription = "One or more VHDs allocated from the pool (identified by value of AlertingManagedElement property) is experiencing insufficient throughput and is not able to meet its configured IOPSReservation.";
    SystemCreationClassName = "Msvm_ComputerSystem";
    SystemName = "TESTHOST01";
    TIME_CREATED = "130337413826727692";
};

That’s great, but what virtual hard disk of what VM is causing this? That’s the question we’ll dive into in this blog. Let’s go. On MSDN docs on Msvm_StorageAlert class we read:

Remarks

The Hyper-V WMI provider won’t raise events for individual virtual disks to avoid flooding clients with events in case of large scale malfunctions of the underlying storage systems.

When a client receives an Msvm_StorageAlert event, if the value of the ProbableCause property is 50 (“Storage Capacity Problem“), the client can discover which virtual disks are operating outside their QoS policy by using one of these procedures:

Query all the Msvm_LogicalDisk instances that were allocated from the resource pool for which the event was generated. These Msvm_LogicalDisk instances are associated to the resource pool via the Msvm_ElementAllocatedFromPool association.
Filter the result list by selecting instances whose OperationalStatus contains “Insufficient Throughput”.

So I query  (NOT a notification query!) the Msvm_ElementAllocatedFromPool class, click through on a result and select Show MOF.

image

image

 

Let’s look at that MOF …In yellow is the GUID of our VM ID. Hey cool!

instance of Msvm_ElementAllocatedFromPool
{
    Antecedent = "\\TESTHOST01\root\virtualization\v2:Msvm_ProcessorPool.InstanceID="Microsoft:B637F347-6A0E-4DEC-AF52-BD70CB80A21D"";
    Dependent = "\\TESTHOST01\root\virtualization\v2:Msvm_Processor.CreationClassName="Msvm_Processor",DeviceID="Microsoft:b637f346-6a0e-4dec-af52-bd70cb80a21d\\6",SystemCreationClassName="Msvm_ComputerSystem",SystemName="
96CD7F7E-0C0A-42FE-96CB-B5550D937F27"";
};

Now we want to find the virtual hard disk in question! So let’s do what the docs says and query Msvn_LogicalDisk based on the VM GUID we find the relates results …

image

Look we got OperationalStatus 32788 which means InsufficientThroughput, cool we’re on the right track … now we need to find what virtual disk of our VM  that is. Well in the above MOF we find the device ID:     DeviceID = "Microsoft:5F6D764F-1BD4-4C5D-B473-32A974FB1CA2\\L"

Well if we then do a query for Msvm_StorageAllocationSettingData we find two entrties for our VM GUID (it has two disks) and by looking at the value InstanceID that contains the above DeviceID we find the virtual hard disk info we needed to identify the one not getting the minimum IOPS.

image

HostResource = {"C:\ClusterStorage\Volume5\DidierTest01\Virtual Hard Disks\DidierTest01Disk02.vhdx"};
HostResourceBlockSize = NULL;
InstanceID = "Microsoft:96CD7F7E-0C0A-42FE-96CB-B5550D937F27\5F6D764F-1BD4-4C5D-B473-32A974FB1CA2\\L";

Are you tired yet? Do you realize you need to do this while the disk IOPS is not being met to see the events. This is no way to it in production. Not on a dozen servers, let alone on a couple of hundred to thousands or more hosts is it? All the above did was give us some insight on where and how. But using wbemtest.exe to diver deeper into wmi notifications/events isn’t really handy in real life. Tools will need to be developed to deal with this larger deployments. The can be provided by your storage vendor, your VAR, integrator or by yourself if you’re a large enough shop to make private cloud viable or if you are the cloud provider Smile.

To give you an idea on how this can be done there is some demo code on MSDN over here and I have that compiled for demo purposes.

We have 4 VMs running on the host.  One of them is being hammered by IOMeter while it’s minimum IOPS have been set to an number it cannot possibly get. We launch StorageQos to monitor our Hyper-V host.

image

Just let it run and when the notification event that the minimum IOPS cannot be delivered on the storage this monitor will query WMI further to tell us what virtual disk or disks are  involved.  Cool huh! A good naming convention can help identify the VM and this tools works remotely against node so you can launch one for each node of the cluster. Time to fire up Visual Studio 2013 me thinks or go and chat to a good dev you might know to take this somewhere, some prefer this sort of work to the modern day version of CRUD apps any day. If you buy monitoring tools you might want them to have this capability.

While this is just demo code, it gives you an idea of how tools and solutions can be developed & build to monitor the Minimum IOPS part of Storage QoS in Windows Server 2012 R2. Hope you found this useful!

Windows Server 2012 R2 Cluster Reset Recent Events With PowerShell

I blogged before about the fact that since Windows Server 2012  we have the ability to reset the recent events shown so that the state of the cluster is squeaky clean with not warnings or errors. You can read up on this here. Windows Server 2012 Cluster Reset Recent Events Feature.

You can also do this in PowerShell like in the example below:

#Connect to cluster & get current RecentEventsResetTime value
$MyCluster = Get-CLuster -name "W2K12R2RTM"
$MyCluster.RecentEventsResetTime

#Reset recent events
$MyCluster.RecentEventsResetTime = get-date
$MyCluster.RecentEventsResetTime

As you may notice, the RecentEventsResetTime is displayed in UTC when read form the cluster after connecting to it. Right after you set it it displays the time respectful of the time zone you’re in right until you connect to the cluster again. We demonstrate this in the 2 screenshots below (I’m at GMT+1).

image

image

This comes in handy when writing test, comparison & demo scripts. Often you do things with the network that causes network connectivity to be lost when the NIC gets reset (disabled/enabled) and such. Also when something fails as part of the demo or tests scripts it’s nice to start the rerun or the next part of the demo/test with a clean cluster GUI when you’re showcasing stuff. Unfortunately an already GUI doesn’t refresh these setting if the reset is not done in the GUI. So you need to open a new one. For scripting you don’t have this issue. EDIT: In Windows 2012 R2 you can use the $MyCluster.Update() to reflect the new value of RecentEventsResetTime in UTC without having to reconnect to the cluster. In Windows Server 2012 this Update method isn’t available but it seems to happen automatic.

Linux Integration Services Version 3.5 for Hyper-V Available For Download

Yesterday, December 19th 2013, Microsoft made the Linux Integration Services Version 3.5 for Hyper-V available for download.

The Linux Integration Services (LIS) package downloaded from Microsoft  is meant to deliver support older Linux distros. In the most recent Linux distros the KVP component is to be included, as are the other Hyper-V related drivers. In these distros these drivers and components are to be part of the upstream Linux kernel, and as such are included in Linux distros releases. So you should not need this download if you run these newer distros that has the LIS built-in. The list of supported distros is slowly growing.

image

If you are running (or need to run) older versions of Linux in your VMs and leverage the 100% fully featured Hyper-v Server 2012 R2 that is also 100% free of charge this is your way to leverage all those features. The aim is that you’re never a left behind when running Hyper-V (within the limits of supportability, DOS 6.0, NT 4.0 or Windows 2000 is not an acceptable OS today).

In Microsoft speak:

Hyper-V supports both emulated (“legacy”) and Hyper-V-specific (“synthetic”) devices for Linux virtual machines. When a Linux virtual machine is running with emulated devices, no additional software is required to be installed. However, emulated devices do not provide high performance and cannot leverage the rich virtual machine management infrastructure that the Hyper-V technology offers.

To make full use of all benefits that Hyper-V provides, it is best to use Hyper-V-
specific devices for Linux. The collection of drivers that are required to run Hyper-V-specific devices is known as Linux Integration Services (LIS).
 
For certain older Linux distributions, Microsoft provides an ISO file containing installable LIS drivers for Linux virtual machines. For newer Linux distributions, LIS is built into the Linux operating system, and no separate download or installation is required. This guide discusses the installation and functionality of LIS drivers on older Linux distributions.

For some extra info an tips see Enabling Linux Support on Windows Server 2012 R2 Hyper-V

Failed Live Migrations with Event ID 21502 Planned virtual machine creation failed for virtual machine ‘VM Name’: An existing connection was forcibly closed by the remote host. (0x80072746) Caused By Wrong Jumbo Frame Settings

OK so Live Migration fails and you get the following error in the System even log with event id 21502:

image

Planned virtual machine creation failed for virtual machine ‘DidierTest01’: An existing connection was forcibly closed by the remote host. (0x80072746). (Virtual Machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).

Failed to receive data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0x80072746).

There are some threads on the TechNet forums on this like here http://social.technet.microsoft.com/Forums/en-US/805466e8-f874-4851-953f-59cdbd4f3d9f/windows-2012-hyperv-live-migration-failed-with-an-existing-connection-was-forcibly-closed-by-the and some blog post pointing to TCP/IP Chimney settings causing this but those causes stem back to the Windows Server 2003 / 2008 era.

In the Hyper-V event log Microsoft-Windows-Hyper-V-VMMS-Admin you also see a series of entries related to the failed live migration point to the same issue: image

  
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:15 AM
Event ID:      20413
Task Category: None
Level:         Information
Keywords:     
User:          SYSTEM
Computer:      SRV1.BLOG.COM
Description:
The Virtual Machine Management service initiated the live migration of virtual machine  ‘DidierTest01’ to destination host ‘SRV2’ (VMID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).
 
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:26 AM
Event ID:      22038
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      SRV1.BLOG.COM
Description:
Failed to send data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0x80072746).
 
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:26 AM
Event ID:      21018
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      SRV1.BLOG.COM
Description:
Planned virtual machine creation failed for virtual machine ‘DidierTest01’: An existing connection was forcibly closed by the remote host. (0x80072746). (Virtual Machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).
 
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:26 AM
Event ID:      22040
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      SRV1.BLOG.COM
Description:
Failed to receive data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0x80072746).
Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          10/8/2013 10:06:26 AM
Event ID:      21024
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      srv1.blog.com
Description:
Virtual machine migration operation for ‘DidierTest01’ failed at migration source ‘SRV1’. (Virtual machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27)

There is something wrong with the network and if all checks out on your cluster & hosts it’s time to look beyond that. Well as it turns out it was the Jumbo Frame setting on the CSV and LM NICs.

Those servers had been connected to a couple of DELL Force10  S4810 switches. These can handle an MTU size up to 12000. And that’s how they are configured. The Mellanox NICs allow for MTU Sizes up to 9614 in their Jumbo Frame property.  Now super sized jumbo frames are all cool until you attach the network cables to another switch like a PowerConnect 8132 that has a max MTU size of 9216. That moment your network won’t do what it’s supposed to and you see errors like those above. If you test via an SMB share things seem OK & standard pings don’t show the issue. But some ping tests with different mtu sizes & the –f (do no fragment) switch will unmask the issue soon. Setting the Jumbo Frame size on the CSV & LM NICs to 9014 resolved the issue.

Now if on the server side everything matches up but not on the switches you’ll also get an event id 21502 but with a different error message:

Event ID: 21502 The Virtual Machine Management Service failed to establish a connection for a Virtual machine migration with host XXXX. A connection attempt failed because the connected party did not properly respond after a period of time, or the established connection failed because connected host has failed to respond (0X8007274C)

image

This is the same message you’ll get for a known cause of shared nothing live migration failing as described in this blog post by Microsoft Shared Nothing Migration fails (0x8007274C).

So there you go. Keep an eye on those Jumbo Frame setting especially in a mixed switch environment. They all have their own capabilities, rules & peculiarities. Make sure to test end to end and you’ll be just fine.