Hyper-V UNMAP Does Work With SAN Snapshots And Checkpoints But Not Always As You First Expect

Recently I was asked to take a look at why UNMAP was not working predictably  in a Windows Server 2012 R2 Hyper-V environment. No, this is not a horror story about bugs or bad storage solutions. Fortunately, once the horror option was of the table I had a pretty good idea what might be the cause.

San snapshots are in play

As it turned out everything was indeed working just fine. The unexpected behavior that made it seem that UNMAP wasn’t working well or at least at moments they didn’t expected it was caused by the SAN snapshots. Once you know how this works you’ll find that UNMAP does indeed work predictably.

Snapshots on SANs are used for automatic data tiering, data protection and various other use cases. As long as those snapshots live, and as such the data in them, UNMAP/Trim will not free up space on the SAN with thinly provisioned LUNs. This is logical, as the data is still stored on the SAN for those snapshots, hard deleting it form the VM or host has no impact on the storage the SAN uses until those snapshots are deleted or expire. Only what happens in the active portion is directly impacted.

An example

  • Take a VM with a dynamically expanding VHDX that’s empty and mapped to drive letter D. Note the file size of the VHDX and the space consumed on the thinly provisioned SAN LUN where it resides.
  • Create 30GB of data in that dynamically expanding  virtual hard disk of the virtual machine
  • Create a SAN snapshot
  • Shift + Delete that 30GB of data from the dynamically expanding virtual hard disk in the virtual machine. Watch the dynamically expanding VHDX  grow in size, just like the space consumed on the SAN
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: it remains +/- the same.
  • Shut down the VM and look at the size of the dynamic VHDX file. It shrinks to the size before you copied the data into it.
  • Boot the VM again and copy 30GB of data to the dynamically expanding VHDX in the VM again.
  • See the size of the VHDX grow and notice that the space consumed on the SAN for that LUN goes up as well.
  • Shift + Delete that 30GB of data from the dynamically expanding  virtual hard disk in the virtual machine
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: It drops, as the data you delete is in the active part of your LUN (the second 30GB you copied), but it will not drop any more than this as the data kept safe in the frozen snapshot of the LUN is remains there (the first 30GB you copied)
  • When you expire/delete that snapshot on the SAN  we’ll see the size on the thinly provisioned SAN LUN  drop to the initial size of this exercise.

I hope this example gave you some insights into the behavior

Conclusion

So people who have snapshot based automatic data tiering, data protection etc. active in their Hyper-V environment and don’t see any results at all should check those snapshot schedules & live times. When you take them into consideration you’ll see that UNMAP does work predictably, all be it in a “delayed” fashion Smile.

The same goes for Hyper-V checkpoints (formerly known as snapshots). When you create a checkpoint the VHDX is kept and you are writing to a avhdx (differencing disk) meaning that any UNMAP activity will only reflect on data in the active avhdx file and not in the “frozen” parent file.

Windows Server 2012 VHDX Thin Provisioning Benefits Explored

Thin Provisioning With Hyper-V

Windows Server 2012 provides thins provisioning at the virtual layer via the VHDX file format. It also provides it at the physical storage layer when your storage supports it. For the later don’t forget that this also means Storage Spaces! So even in environments where budgets are really tight you can leverage this on the physical storage now. So its not just for the feature rich SAN owners anymore Smile.

Even if you use a storage sub system that does not support thin provisioning at the physical layer you will benefit from this mechanism when you use dynamic VHDX files. Not only will these grow less but during shut down they shrink by the size of the empty blocks. Pretty cool! I do however see a potential risk for increased fragmentation. This has a negative impact on performance and needs defragmentation to remediate which also has an impact on IO performance. How much this is a concern depends on your environment and needs. We’ll also have to see in real life how well dynamic VHDX files live up to their performance improvements they got with Windows Server 2012 to entice more people to use this. You have proponents and naysayers. I’m selective and let the circumstances and needs/requirements decide.

Thin Provisioning at the Virtual Layer

You can take a look at the TechEd 2012 session VIR301 by Senthil Rajaram to see how VHD versus VHDX behaves in regards to thin provisioning. I will not repeat all of this here. What I am going to do is look at some other situations.

Important note: You get this UNMAP feature automatically in Windows. There’s no need to manually run the Optimize-Volume command we’ll use in the scenarios below. It’s run automatically for us when the standard Defrag scheduled task runs or during the NTFS check pointing mechanisms that sends the info down every 5 minutes.  So these will normally take care of all that. But the defrag “only” runs every week by default you might want to tweak it or create your own scheduled task in your environment if needed. In demos and labs we’re rather inpatient geeks so even the 5 minute interval for the check pointing mechanisms are to long so we run “Optimize-Volume  -DriveLetter X –ReTrim” to get immediate gratification while testing. In real life it’s zero touch feature, you don’t need to baby sit it.

Fixed VHDX versus Dynamic VHDX

Apart from the fact that you’ll have no shrink on shutdown this optimization does nothing for the file size. The only benefit here is that the UNMAP can be passed to the physical storage where it can help if that supports it. At the virtual layer it doesn’t matter for a fixed sized VDHX disk.

Dynamic VHDX Disk

You’ll profit from the savings in storage when the dynamically expanding VHDX file doesn’t need to grow as much this. This reduces the overhead of expanding the disk, which is a performance benefit and it even helps your non thin provisioning capable storage go further.

Watch Senthil’s presentation (from around minute 20) to see the benefits in action. With VHDX, If you “shift delete” the files inside the VM, then run “Optimize-Volume -DriveLetter X –ReTrim” or  the defrag job and then copy new files  you’ll see that there is no additional file growth as long as you don’t exceed the current size of the VHDX. If you don’t do this both the VHD and VHDX file will grow.

But is another potential benefit why this might be important. Even with the block sizes that have been increased to have less overhead when growing dynamic VDHX files we still have to deal with fragmentation of the VHDX files on the storage where they live. The better/more empty blocks are reused, the less the dynamic files will have to grow. This means you’ll have less opportunity for fragmentation. Whether this compensates for potential of more fragmentation due to the shrinking when they are shutdown I don’t know. If all the performance improvements for dynamic disks are good enough will depend on your environment and needs. Defragmentation can help mitigate this but IO performance during the defragmentation process suffers. Do it or better, schedule it, wisely!

Virtual SCSI controller attached versus virtual IDE controller attached

What about a guest (boot) VHDX disk attached to an IDE controller? I see a lot of one disk virtual machines out there, so it would be a pity if it didn’t work for those and just for the one who have extras vSCSI disk attached.So let’s test this.

image

Below you see the disk size of the VHD and VHDX files and what type of controller they are attached to. As you can see this they had one or two 3.3 GB ISO files copied to them and where then “shift deleted”. The size of the VHD(X) files reflects the amount of data that they stored.

image

Now after running the defrag job or executing “Optimize-Volume -DriveLetter X –ReTrim” inside the VM you’ll see the results below after you shut down the VM

image

So as it turns out, the thin provisioning benefits it work with an IDE attached VHDX files as well! Yes inside a Windows Server 2012 virtual machine you get the UNMAP support with IDE attached VHDX disks to. Think of Hosting companies with many thousands single disk virtual machines who can leverage this as well. So this is something you might not expect when having watch the video as there they only talk about virtual SCSI/ FC controllers.

Conclusion

Doing tests like these are a bit artificial but they do demonstrate how the technology works. In real life it will translate into efficiencies over time, based on the data creation and deletion in your VHDX files. Think about hundreds or thousands of virtual machines in your environment leveraging this mechanism. Over time, on that scale, the amount of storage consumed will be reduced which results in better economies. Now leverage that together with thin provisioning support in Storage Spaces and you see that there are some very interesting scenarios to investigate. Some how it’s starting to look like you can have your cookie and eat it to Smile. You don’t need an expensive SAN to get these efficiencies at the physical storage layer, but if you have and use to have to mess around with sdelete or agents, it’s easy to see the benefit you get from this here as well.

Some SAN Storage Fun

At the end of this day I was doing some basic IO tests on some LUNs on one of the new Compellent SANs. It’s amazing what 10 SSDs can achieve … We can still beat them in  certain scenarios but it takes 15 times more disks. But that’s not what this blog is about. This is about goofing off after 20:00 following another long day in another very long week, it’s about kicking the tires of Windows and the SAN now that we can.

For fun I created a 300TB LUN on a DELL Compellent, thin provisioned off cause, I only have 250 TB Smile

I then mounted it to a Windows 2008 R2 test server.

image

The documented limit of a Volume in Windows 2008 R2 is 256TB when you use 64K allocation size. So I tested this limit by trying to format the entire LUN and create a 300TB simple volume. I brought it online, initialized it to an GPT disk, created a simple volume with an allocation unit size of 64K and well that failed with following error:

Failed Format300TB

There is nothing unexpected about this. This has to do with the maximum NTFS volume size supported on a GPT disk. It depends on the cluster size that is selected at the time of formatting. NTFS is currently limited to 2^32-1 allocation units. This yields a 256TB volume, using 64k clusters. However, this has only been tested to 16TB, or 17,592,186,040,320 bytes, using 4K cluster size. You can read up on this in Frequently asked questions about the GUID Partitioning Table disk architecture. The table below shows the NTFS limits based on cluster size.

image

This was the first time I had the opportunity to test these limits I formatted part of that LUN to a size close to the limit and than formatted the remainder to a second simple volume.

image

I still need get a Windows Server 2012 test server hooked up to the SAN. To see if anything has changed there. One thing is for sure, you could put at least 3 64TB VHDX files on a single volume in Windows. Not too shabby Smile. It’s more than enough to put just about any backup software into problems. Be warned, MSFT tested and guarantees performance & behavior up to 64TB in Windows Server 2012, but beyond that you’d better do your own due diligence.

The next thing I’ll do when I have a Windows Server 2012 host hooked up is, is create 64TB VHDX file and see if I can go beyond it before things break. Why, well because I can and I want to take the new SAN and Windows 2012 for a ride to see what boundaries we can push. The SANs are just being set up so now is the time to do some testing.

TRIM/UNMAP Support in Windows Server 2012 & Hyper-V/VHDX

Introduction

I’m very exited about the TRIM/UNMAP support in Windows Server 2012 & Hyper-V with the VHDX file. Thin provisioning is a great technology. It’s there is more to it than just proactive provisioning ahead of time. It also provides a way to make sure storage allocation stays thin by reclaiming freed up space form a LUN. Until now this required either the use of sdelete on windows or dd for the Linux crowd, or some disk defrag product like Raxco’s PerfectDisk. It’s interesting to note here that sdelete relies on the defrag APIs in Windows and you can see how a defragmentation tool can pull off the same stunt. Take a look at Zero-fill Free Space and Thin-Provisioned Disks & Thin-Provisioned Environments for more information on this. Sometimes an agent is provided by the SAN vendor that takes care of this for you (Compellent) and I think NetApp even has plans to support it via a future ONTAP PowerShell toolkit for NTFS partitions inside the VHD (https://communities.netapp.com/community/netapp-blogs/msenviro/blog/2011/09/22/getting-ready-for-windows-server-8-part-i).  Some cluster file system vendors like Veritas (symantec) also offer this functionality.

A common “issue” people have with sdelete or the like is that is rather slow, rather resource intensive and it’s not automated unless you have scheduled tasks running on all your hosts to take care of that. Sdelete has some other issue when you have mount points, sdelete can’t handle that. A trick is to use the now somewhat ancient SUBST command to assign a drive letter to the path of the mount point you can use sdelete. Another trick would be to script it yourself see. Mind you can’t just create a big file in a script and delete it. That’s the same as deleting “normal” data and won’t do a thing for thing provisioning space reclamation. You really have to zero the space out. See (A PowerShell Alternative to SDelete) for more information on this. The script also deals with another annoying thing of sdelete is that is doesn’t leave any free space and thereby potentially endangers your operations or at least sets off all alarms on the monitoring tools. With a home grown script you can force a free percentage to remain untouched.

TRIM/UNMAP

With Windows Server 2012 and Hyper-V VHDX we get what is described in the documentation  “’Efficiency in representing data (also known as “trim”), which results in smaller file size and allows the underlying physical storage device to reclaim unused space. (Trim requires physical disks directly attached to a virtual machine or SCSI disks in the VM, and trim-compatible hardware.)  It also requires Windows 2012 on hosts & guests.

I was confused as to whether VHDX supports TRIM or UNMAP. TRIM is the specification for this functionality by Technical Committee T13, that handles all standards for ATA interfaces. UNMAP is the Technical Committee T10 specification for this and is the full equivalent of TRIM but for SCSI disks. UNMAP is used to remove physical blocks from the storage allocation in thinly provisioned Storage Area Networks. My understanding is that is what is used on the physical storage depends on what storage it is (SSD/SAS/SATA/NL-SAS or SAN with one or all or the above) and for a VHDX it’s UNMAP (SCSI standard)

Basically VHDX disks report themselves as being “thin provision capable”. That means that any deletes as well as defrag operation in the guests will send down “unmaps” to the VHDX file, which will be used to ensure that block allocations within the VHDX file is freed up for subsequent allocations as well as the same requests are forwarded to the physical hardware which can reuse it for it’s thin provisioning purpose. Also see http://msdn.microsoft.com/en-us/library/hh848053(v=vs.85).aspx

So unmap makes it way down the stack from the guest Windows Server 2012 Operating system, the VHDX , the hyper visor and the storage array.This means that an VHDX will only consume storage for really stored data & not for the entire size of the VHDX, even when it is a fixed one. You can see that not just the operating system but also the application/hypervisor that owns the file systems on which the VHDX lives needs to be TRIM/UNMAP aware to pull this off.

The good news here is that there is no more sdelete to run, scripts to write, or agents to install. It happens “automagically” and as ease of use is very important I for one welcome this!  By the way some SANs also provide the means to shrink LUNs which can be useful if you want the space used by a volume is so much lower than what is visible/available in windows and you don’t want people to think you’re wasting space or all that extra space is freely available to them.

To conclude I’ll be looking forward to playing around with this and I hope to blog on our experiences with this later in the year. Until Windows Server 2012 & VHDX specifications are RTM and fully public we are working on some assumptions. If you want to read up on the VHDX format you can download the specs here. It looks pretty feature complete.