How To Measure IOPS Of A Virtual Machine With Resource Metering And MeasureVM

The first time we used the Storage QoS capabilities in Windows Server 2012 R2 it was done in a trial and error fashion. We knew that it was the new VM causing the disruption and kind of dropped the Maximum IOPS to a level that was acceptable.  We also ran some PerfMon stats & looked at the IOPS on the HBA going the host. It was all a bit tedious and convoluted.  Discussing this with Senthil Rajaram, who’s heavily involved with anything storage at Microsoft he educated me on how to get it done fast & easy.

Fast & easy insight into virtual machine IOPS.

The fast and easy way to get a quick feel for what IOPS a VM is generating has become available via resource metering and Measure-VM. In Windows Server 2012 R2 we have new storage metrics we can use for that, it’s not just cool for charge back or show back Smile.

So what did we get extra  in Windows Server 2012 R2? Well, some new storage metrics per virtual disk

  1. Average Normalized IOPS (Averaged over 20s)
  2. Average latency (Averaged over 20s)
  3. Aggregate Data Written (between start and stop metric command)
  4. Aggregate Data Read (between start and stop metric command)

Well that sounds exactly like what we need!

How to use this when you want to do storage QoS on a virtual machine’s virtual disk or disks

All we need to do is turn on resource metering for the VMs of interest. The below command run in an elevated PowerShell console will enable it for all VMs on a host.image

We now run measure-VM DidierTest01 | fl and see that we have no values yet for the properties . Since we haven’t generated any IOPS yes this is normal.image

So we now run IOMeter to generate some IOPSimage

and than run measure-VM DidierTest01 | fl again. We see that the properties have risen.image

It’s normal that the AggregatedAverageNormalizedIOPS and AggregatedAverageLatency are the averages measured over a period of 20 seconds at the moment of sampling. The value  AggregatedDiskDataRead and AggregatedDiskDataWritten are the averages since we started counting (since we ran Enable-VMResourceMetering for that VM ), it’s a running sum, so it’s normal that the average is lower initially than we expected as the VM was idle between enabling resource metering and generating some IOPS.

All we need to do is keep the VM idle wait 30 seconds so and when we run again measure-VM DidierTest01 | fl again we see the following?image

While the values AggregatedAverageNormalizedIOPS and AggregatedAverageLatency are the value reflecting a 20s average that’s collected at measuring time and as such drop to zero over time. The values for AggregatedDiskDataRead and AggregatedDiskDataWritten are a running sum. They stay the same until we disable or reset resource metering.

Let’s generate some extra IO, after which we wait a while (> 20 seconds) before we run measure-VM DidierTest01 | fl again and get updated information. We confirm see that indeed AggregatedDiskDataRead and AggregatedDiskDataWritten is a running sum and that AggregatedAverageNormalizedIOPS and AggregatedAverageLatency have dropped to 0 again.

image

Anyway, it’s clear to you that the sampled value of AggregatedAverageNormalizedIOPS is what you’re interested in when trying to get a feel for the value you need to set in order to limit a virtual hard disk to an acceptable number of normalized IOPS.

But wait, that’s aggregated! I have SQL Server VMs with 4 virtual hard disks. How do I know what hard disk is generating what IOPS? The docs say the metrics are per virtual hard disk, right?! I need to know if it’s the virtual hard disk with TempDB or the one with the LOGS causing the IO issue.

Well the info is there but it requires a few more lines of PowerShell:

cls
$VMName  = "Didiertest01" 
enable-VMresourcemetering -VMName $VMName 
$VMReport = measure-VM $VMName 
$DiskInfo = $VMReport.HardDiskMetrics
write-Host "IOPS info VM $VMName" -ForegroundColor Green
$count = 1
foreach ($Disk in $DiskInfo)
{
Write-Host "Virtual hard disk $count information" -ForegroundColor cyan
$Disk.VirtualHardDisk | fl  *
Write-Host "Normalized IOPS for this virtual hard disk" -ForegroundColor cyan
$Disk
$count = $Count +1 
}

Resulting in following output:

image

Hope this helps! Windows Server 2012 R2 make life as a virtualization admin easier with nice tools like this at our disposal.

Storage Quality of Service (QoS) In Windows Server 2012 R2

In Windows Server 2012 R2 Hyper-V we have the ability to set  quality-of-service (QoS) options for a virtual machine at the virtual disk level. There is no QoS (yet) for shared VHDX, so it’s a per individual VM, per virtual hard disk associated with that virtual machine setting for now.

What can we do?

  • Limit – Maximum IOPS
  • Reserve – Minimum IOPS threshold alerts
  • Measure – New Storage attributes in VM Metrics

image

Limit

Storage QoS allows you to specify maximum input/output operations per second (IOPS) value for a virtual hard disk associated with virtual machine. This puts a limit on what a virtual disk can use. This means that one or more VMs cannot steal away all IOPS from the others (perhaps even belonging to separate customers). So this is an automatic hard cap.

Reserve

We can also set a minimum IOPS value. This is often referred to as the reserve. This is not hard minimum. Here’s a worth of warning, unless you hopelessly overprovision your physical storage capabilities (ruling out disk, controller issues, HBA problems & other risks that impact deliverable IOPS) and dedicate it to a single Hyper-V host with a single VM (ruling out the unknown) you cannot ever guarantee IOPS. It’s best effort. It might fail but than events will alert you that things are going south. We will be notified when the IOPS to a specified virtual hard disk is below that reserve you specified?that is needed for its optimal performance.  We’ll talk more about this in another blog post.

Measure

The virtual machine metrics infrastructure have been extended with storage related attributes so we can monitor the performance (and so charge or show back).  To do this they use what they call “normalized IOPS” where every 8 K of data is counted as one I/O. This is how the values are measured and set. So it’s just for that purpose alone.

  • One 4K I/O = 1 Normalized I/O
  • One 8K I/O = 1 Normalized I/O
  • One 10K I/O = 2 Normalized I/Os
  • One 16K I/O = 2 Normalized I/Os
  • One 20K I/O = 3 Normalized I/Os

A Little Scenario

We take IO Meter and we put it inside 2 virtual machines. These virtual machine reside on a Hyper-V Cluster that is leveraging shared storage on a SAN. Let’s say you have a VM that requires 45000 IOPS at times and as long as it can get that when needed all is well.

All is well until one day a project that goes into production has not been designed/written with storage IOPS (real needs & effects) in mind. So while they have no issue the application behaves as a scrounging hog eating a humongous size of the IOPS the storage can deliver.

Now, you do some investigation (pays to be buddies with a good developer and own the entire infrastructure stack) and notice that they don’t need those IOPS as they:

  1. Can do more intelligent data retrieval slashing IOPS in half.
  2. They waste 75% of the time in several suboptimal algorithms for sorting & parsing data anyway.
  3. The number of users isn’t that high and the impact of reducing storage IOPS is non existent due to (2).

All valid excuses to take the IOPS away …You think let’s ask the PM to deal with this. They might, they might not, and if they do it might take time. But while that remains to be seen, you have a critical solution that serves many customers who’re losing real money because of that drop in IOPS has become an issue with the application. So what’s the easiest thing to do? Cap that IOPS hog! Here the video on how you deal with this on Vimeo: http://vimeo.com/82728497

image

Now let’s enable QoS as in the screenshot below. We give it a best effort 2000 IOPS minimum and a hard maximum of 3000 IOPS.

image

The moment you click “Apply” it kicks in! You can do this live, not service interruption/ system downtime is needed.

image

I actually have a hard cap of 50000 on the business critical app as well just to make sure the other VMs don’t get starved. Remember that minimum is a soft reserve. You get warned but it can’t give what potentially isn’t available. After all, as always, it’s technology, not magic.

In a next blog we’ll discuss QoS a bit more and what’s in play with storage IO management in Hyper-V, what the limitations are and as such we get an idea what Microsoft should pay attention to vNext.

Remarks

Well doing this for a 24 node Hyper-V cluster with 500 VMs could be a bit of challenge.

Windows Server 2012 R2 Cluster Reset Recent Events With PowerShell

I blogged before about the fact that since Windows Server 2012  we have the ability to reset the recent events shown so that the state of the cluster is squeaky clean with not warnings or errors. You can read up on this here. Windows Server 2012 Cluster Reset Recent Events Feature.

You can also do this in PowerShell like in the example below:

#Connect to cluster & get current RecentEventsResetTime value
$MyCluster = Get-CLuster -name "W2K12R2RTM"
$MyCluster.RecentEventsResetTime

#Reset recent events
$MyCluster.RecentEventsResetTime = get-date
$MyCluster.RecentEventsResetTime

As you may notice, the RecentEventsResetTime is displayed in UTC when read form the cluster after connecting to it. Right after you set it it displays the time respectful of the time zone you’re in right until you connect to the cluster again. We demonstrate this in the 2 screenshots below (I’m at GMT+1).

image

image

This comes in handy when writing test, comparison & demo scripts. Often you do things with the network that causes network connectivity to be lost when the NIC gets reset (disabled/enabled) and such. Also when something fails as part of the demo or tests scripts it’s nice to start the rerun or the next part of the demo/test with a clean cluster GUI when you’re showcasing stuff. Unfortunately an already GUI doesn’t refresh these setting if the reset is not done in the GUI. So you need to open a new one. For scripting you don’t have this issue. EDIT: In Windows 2012 R2 you can use the $MyCluster.Update() to reflect the new value of RecentEventsResetTime in UTC without having to reconnect to the cluster. In Windows Server 2012 this Update method isn’t available but it seems to happen automatic.

Don’t Let Live Migration Settings Behavior Confuse You

Configuring live migration settings on a cluster

In the cluster under Networks, Live Migration Settings you can select what networks are available for live migration and their order of preference.image

image

Basically this setting determines what NICs can be used. It also determines and in what order of preference the available networks can be used by Live Migration. It does not determine bandwidth aggregation or failover. All it does is provide the order in which the redundant networks will be used. It’s up to the cluster service NETFT, Multichannel or SMB Direct to provide the bandwidth aggregation if possible As you can see we use LM/CSV over SMB and as our two NICS are RDMA capable 10Gbps NIC, multichannel will discover RDMA capabilities & leverage SMB Direct if it can be establish otherwise it will just stick with multichannel. If you would  team that NIC shows up as just one network. Also not that if you lose a NIC during live migration it might fail for some VMs under certain scenarios, but you cluster nodes will maintain the capability & recover. The  names of the network reflect this: LM1+CSV2 & CSV1+LM2 will be used both but if for some reason multichannel goes completely south the names reflect the metrics of these networks. The lowest is CSV1+LM2 and the second lowest is LM1+CSV2, reflect on how NETFT will select to use which automatically based on the metrics. I.e. It’s “self documentation” for human consumption.

image

Sometimes you might get surprising results. An example. If you’ve selected SMB for Live Migration and you have selected only one of the NICs here.image Still when you look at perform you might see both being used! What’s happening is that multichannel will kick in (and use two or more similar NICs when it finds them and if applicable move to RDMA.

image

So here we select SMB for the live migration type and the two equally capable 10GBps NICs available for live migration it will use them, even if you selected only one of them in the cluster network settings for live migration.

Why is that? Well, there is still another location where live migration settings are defined and that is in Hyper-V Manager. Take a look at the screenshot below.

image

The Incoming live migration settings is set to “Use any available network for live migration”. If you have this on it will still leverage both as when one is used multichannel drag the other one into action anyway, no matter what you set in network settings for live migration in the Cluster GUI (it set to use only one and dimmed out).

Do note that on Hyper-V Manager the settings for live migrations specify “Incoming live migrations”. That leads us to believe that it’s the target, the node where the VMs are migrated to that determines what NICs get used. But let’s put this to the test.

Testing A Hyper-V cluster with two nodes – Node A and Node B

On the cluster network settings you select only one network.

image

On  Hyper-V cluster Node A you have configured the following for live migration via Hyper-V Manager to “Use these IP addresses for live migration”. You cannot add or remove networks, the networks used are defined by the cluster.

image

On  Hyper-V cluster Node B you have configured the following for live migration via Hyper-V Manager to “Use any available network for live migration”.

As we now kick of a live migration from node A to node B we’ll see both NICs being used. Why well because Node B is the target and Node B has the setting “Use any available network for live migration”. So why then only these 2, why not pick up any other suitable NICs? Well we’ve configured the live migration on both nodes to use SMB. As this cluster is RDMA capable that means it will leverage multichannel/SMB direct. The auto configuration will select the best, equally capable NICs for this and that’s these two in our scenario. Remember the capabilities of the NICs have to match. So no mixtures of 1 * 1Gbps and 1 *10Gbps or 1 * multichannel and 1 SMB Direct.

The confusion really sets in that even if live migrate from Node B to A it will also use both NICs! Hum, that is “Incoming Live Migrations” is not always “correct” it seems, not when using SMB as a performance option at least. Apparently multichannel will kick in in both directions.

If you set both to Node A and Node B to “Use these IP addresses for live migration” and leave the cluster network setting with only one network it does only use one, even with SMB as a performance option for live migration. This is expected.

Note: I had one interesting hiccup while testing this configuration: when doing the latter one of the VMs failed in live migration of the entire host. I ran it again and that one VM still used both networks. All others went well during host migration with just one being used. That was a bit of a huh Confused smile moment and it sure tripped me up & kept me busy for a while. I blame RDMA and the position of the planets & constellations.

Things aren’t always what they seem at first and it’s good to keep that in mind. The moment you think you got if figured out, you’re wrong Winking smile. So look again & investigate.