You cannot shrink a VHDX file because you cannot shrink the volume on the virtual disk

Introduction

I have discussed the capability of resizing a VHDX on line in this blog post Online Resizing Of Hyper-V Virtual Disks Is Possible in Windows 2012 R2. It’s a good resource to learn how to successfully do so.

Despite this you still might run into issue. As mentioned in the above blog post you need unallocated disk space at the end of the disk inside the virtual machine or you cannot shrink the VHDX at all. This situation is shown in the screenshot below.

clip_image001

In most cased this will call for you to shrink the volume size inside your virtual machine first as all space might be allocated to the volume. For this article we’ve set up a lab virtual machine to recreate the issue. The virtual machine had the page file disabled initially. We copied lots of data in it and then created shadow copies. Only then did we created a 10GB fixed sized page file to make sure it was somewhere in the beginning of the volume space. All of this was done to simulate a real world situation with lot of data churn over time. We then shift deleted the data. We now take a look at the disk where we need to shrink volume C in order to be able to shrink the virtual disk itself.

clip_image002

For the shrinking of a volume to succeed you need free space in that volume. But sometimes this doesn’t shrink a virtual machine as much as you’d like or not at all based on the amount of free space you see in the volume as in the figure below.

clip_image003

We should be able to free up to 26GB it seems. But when you try to shrink that volume you see this:

clip_image004

Only 11GB as available shrink space. Not quite what you’d expect based on the free space on the volume! We’ve seen this a couple of times before with virtual servers in real life. The reasons are actually well known, although more often associated with your PC at home than with virtualized servers. So how do we deal with this?

Dealing with a volume with free space that cannot shrink

The issue at hand is most probably that you have files at the end of that volume on your virtual hard disk file that prevent the disk being shrunk. There are a couple tips and tricks associated with getting this fixed.

Defragment the volume

As long as files are movable fragmentation by itself should not prevent resizing a volume. But it never hurts to run it before and it will create continuous free space at the end of the volume that can be shrunk. What’s more important here is that defragmentation cannot move all files, some are unmovable. These files have their fragments scattered all over the place and might prevent you from shrinking the volume.

On modern Windows operating systems defragmentation is part of the storage optimization maintenance job. It also runs UNMAP which informs the virtual hard disk of free space due to data having been deleted.

clip_image006

That’s all good and it means that you don’t even need to run defragmentation manually. But how can we deal with these unmovable files?

There are free and commercial tools that can defragment unmovable files during a boot time defragmentation run. They can even defragment and move system files that are otherwise impossible to move. A commercial tool can do off line defragmentation of your page file and other system files. By doing the defragmentation during boot time they can handle NTFS metadata files on the %systemdrive% directory (usually C:\) such as $MFTMirr, $LogFile, $Volume, $Bitmap, $Boot, and $BadClus:$Bad.

Not all unmovable files can be dealt with this way however. You must realize that since Windows vista the contents of the System Volume Information directory where Windows stores System Restore Points (shadow copies) are completely off-limits to defragmentation software.

As with many things there are manual workarounds.

Remove any “previous versions” or restore points created by shadow copies

Space efficient as these shadow copies for data protection are they can and do consume space on the disk you’re trying to shrink. As mention above, we cannot deal with them via defragmentation. Getting rid of them temporarily can help in this case. Just enable them again if needed when you’re done resizing the volume.

clip_image008

Tip: You can locate the shadow copies to be on a different disk. That’s worth considering when they grow large for both space considerations and performance.

Could the hibernation file cause issues?

We are discussing resizing and virtual hard disk and on virtual machine you won’t find a hyberfil.sys file. This only comes in to play when shrinking a volume on physical hardware. Hibernation is not supported or even available inside a guest OS. You can see this if you try to enable it:

clip_image009

Disable the page file

The page file itself can be come fragmented and it can reside completely or partially on a location of the disk that prevents the volume from being shrunk. While a page file is important to the operating system you can disable it during a maintenance window to make sure it doesn’t block resizing of the virtual hard disk. Be aware that both disabling and re-enabling the page file requires a reboot. So this does mean the online VHDX resize will cause downtime but that’s not because it’s not supported, but because of the action you need to take here to be able to shrink the volume.

clip_image010

clip_image012

clip_image013

The little extra unallocated space left is taken care or by extending the disk a little. Done!

clip_image014

Don’t forget to turn the page file back on in the best possible configuration for your workload afterwards.

Some situations require even more drastic interventions

Another issue might be that there are multiple volumes on the virtual hard disk and the free space is not at the end of the disk as in the below screen shot.

clip_image016

Unless you can delete volume volume H: and create it again to restore the data to the new volume which is then at the end of volume F: you’ll need to turn to 3rd party tools. Free open source tools like GParted will do the job nicely and I have used it extensively. I have a blog post on using it Using Gparted to fix virtual disk resizing issues. You still want a backup or a copy of your vhdx before doing anything like that, just in case.

The results

In the example above which is a lab setup, deleting the shadow copies and getting rid of the page file which was unfortunately located and prevent shrinking the volume more this allowed to shrink with 23GB instead of 11GB. Not bad.

clip_image017

Which gives us 23GB of unallocated space on the virtual disk.

clip_image018

Which we can now shrink the virtual hard disk with that amount!

clip_image019

clip_image013[1]

The little extra unallocated space left is taken care or by extending the disk a little. Done!

clip_image014[1]

Don’t forget to turn the page file back on in the best possible configuration for your workload afterwards and re-enable shadow copies if needed.

A real Word Example

A real world example of this is when we needed to move a 120 GB of indexing files to a dedicated virtual disk because it was causing the OS volume, the C:\ drive to run out of space. We could and did not want to grow virtual hard disk on which the guest OS drive was located. After we had moved the index we wanted to shrink the volume with about 120 GB, leaving ample frees space for the OS volume to function optimally but we could not. We could gain a pitiful 2GB of space!

First we made sure the index data was shift deleted and ran the optimizer to defrag the disk but that did not help. We check for shadow copies but there were none present. As this was a virtual server we did not have a hyberfil.sys file to worry about. In the end what did the trick for us was disabling the page file, rebooting the virtual machines, shrinking the volume and rebooting the virtual machine again.

Conclusion

You have seen how to address an issue where, despite having free space in a volume you cannot shrink it, and as a result, cannot shrink a VHDX file in size. That was blocking our real goal here, which was to shrink the virtual hard disk. While the latter is possible on line we cannot always mitigate the issues we encounter with shrinking a volume (by itself an online event) without down time. Disabling or enabling the page file require a reboot. Defragmentation can be done on line most of the time, but not when it comes to NTFS metadata. Disabling and enabling shadow copies is an online process however.

This is of cause a prime example of what DevOps and cloud computing at scale is discouraging. That brave new world promotes threating your servers as cattle. When one is giving you an issue you don’t nurse it back to health but fire up the barbeque as Jeffrey Snover would put it. That’s a great model if it applies to your environment. But before you do so I’d make sure that your server is not a holy cow instead of cattle. For many applications, even modern ones, in the enterprise you cannot not just kill them off. If you do you’d better have great backups but even those will not solve issues like we one, we’ve addressed here. The backups are there to protect you when things go wrong with your interventions.

Do you need hard processor affinity in Hyper-V?

Do you need hard processor affinity in Hyper-V? Good question but let’s set the context first. I tend to virtualize workloads that shock some people. Not because they are super huge solutions requiring Petabytes of storage, 48TB of RAM, 256 cores and a million IOPS. Far from. The shock often comes from people who still consider virtualization as something for the lightweight infra services like DHCP, DNS, WSUS, print servers, or web services and websites. Some of these people even tried to virtualize other services like SharePoint ,SQL, Exchange etc. but they did not take into a account that virtualization is not magic and you’ll need to provision adequate resources and design /manage your environment to do so successfully.  So part of them got bitten. They conclude that performance requires physical deployments … and they want to see a material CPU so to speak.

CPU_closeup

When they see virtual machines with 12 tot 16 vCPUs or > 100GB of memory they seem to thinks that even those workloads are bad candidates to virtualize, let alone even bigger ones. That’s not true by definition. As long as you make sure that you know why (cost/benefits/risks) and how to virtualize it can work. You must provision and allocate the required resources. You must also have the right expertise in both virtualization (servers, storage, networking) and the applications involved (SQL, Exchange, 3rd party  products, …)  along with good operational processes.

You can really virtualize a lot when done right. My “virtual first” approach is rule of thumb and exceptions do exist even when I’m calling the shots. However just like people quoting costs, latency, security, lock-in to question the suitability of Public Cloud versus on premises in “subjective” ways, they do so when it comes to virtualization as well. The discussion if often more about organizational issues, control, fear, politics, interests and money. Every hosting provider out there loves virtualization as it’s great for their TCO/ROI. But when it comes to Public Cloud they’re often less convinced. That “datacenter zero” concept isn’t that attractive to them so we see Hybrid and Public Cloud offerings that might not be that good of an idea in some cases but it fits their interests more. Have you noticed that there are no highly automated, optimized data centers anymore, only * clouds? There are valid use cases for hybrid and private clouds but just like with virtualization, maybe we should let go of the personal/business interests, the fear, and false assumptions when advising customers. It all depends.

In this regards I had several discussions now with people about the lack of hard processor affinity with Hyper-V. This makes it unfit for high performance workloads in their opinion. Sure, such cases do exist. These are however, not the majority. As I’ve  been having this discussion rather often in the past months I wrote an article on the subject that I’ve published in collaboration with StarWind Software Need Hard Processor affinity for Hyper-V? The idea is to reach more people and share insights with the community. Full disclosure: I happen to know Anton Kolomyeytsev (CEO, CTO and Chief Architect at StarWind)  professionally as a fellow MVP and I have great respect for his technical expertise, insights and experience. This made me agree to publish some content via their blog. Sharing opinions and ideas with as many people as possible only makes for better technologist everywhere.

WEBINAR: Troubleshooting Microsoft Hyper-V – 4 Tales from the Trenches

I’ve teamed up with Altaro as a guest in one of their Hyper-V webinars to discuss trouble shooting Microsoft Hyper-V. I’ll be joining my fellow Microsoft Cloud and Datacenter Management  Andy Syrewicze for this in a virtual cross Atlantic team

WEBINAR: Troubleshooting Microsoft Hyper-V - 4 Tales from the Trenches

We’ll give you some pointers on good practices and where to start when things go south. We’ll also add some real world examples to the mix to spice things up. As you can imagine these issues are often a lot more fun after the facts and after having solved them. If you work in IT long enough you know that one day (or night) trouble will come knocking and that the stress related with that can be gut wrenching.

The good news is that it isn’t the end of the world. In well designed and managed environment you can minimize downtime and you do not have to suffer through unrecoverable losses as long as you’re well prepared. We hope this webinar will help the attendees prevent those problems in their environment. If not, it will provide you with some insight on how to prepare for and handle them.

The webinar is on February 25th, 2016 at 4pm CET / 10am EST!

The time has been chosen to make it feasible for as many many people across all time zones to attend. So, go on, register, it’s free and both Andy and I are looking forward to sharing our trouble shooting tales from the trenches.

webinar-Join-button-troubleshooting-hyperv

Confusing Mellanox Windows PerfMon Counters

Introduction

So you start out doing SMB Direct. Maybe you’re doing RoCE, if so there’s a good chance you’ll be using the excellent Mellanox cards. You studied hard, read a lot and put some real effort into setting it up. The SMB Direct / DCB configuration is how you think it should be and things are working as expected.

Curious as you are you want to find out if you can see Priority Flow Control work. Well, the easiest way to do so is by using the Windows Performance Monitor counters that Mellanox provides.

Confusing Mellanox Windows PerfMon Counters

So you take your first look at the Mellanox Adaptor QoS Perfmon counters for ConnectX series for SMB Direct (RDMA) traffic. When you want to see what’s happening in regards to pause frames that have been sent and received and what pause duration was requested from the receiving hop (or received from the sending hop) you can get confused. The naming is a bit counter intuitive.

clip_image002

The Rcv Pause duration is not the duration requested by the pause frames the host received, but by the pause frames that host sent. Likewise, the Sent Pause duration is not the duration requested by the pause frames the host send, but by the pause frames that host received.

clip_image004

So you might end up wondering why your host sends pause frames but to only see the Rcv Pause duration go up. Now you know why Smile.

Now there were plans to fix this in WinOF 4.95. The original release note made mention of this and this made me quite happy as most people are confused enough when it comes to RDMA/RoCE/DCB configurations as it is.

A screenshot of the change in the original Mellanox WinOF VPI Release Notes revision 4.95

clip_image005

Unfortunately, this did not happen. It was removed in a newer version of these release notes. My guess is it could have been a breaking chance of some sort if a lot of tooling or automation is expecting these counter names.

I still remember how puzzled I looked at the counters which to me didn’t make sense and the tedious labor of empirical testing to figure out that the wording was a bit “less than optimal”.

But look, once you know this you just need to keep it in mind. For now, we’ll have to live with some confusing Mellanox Windows PerfMon counter names. At least I hope I have saved you the confusion and time I went through when first starting with these Mellanox counters. Other than that I can only say that you should not be discouraged as they have been and are a great tool in checking RoCE DCB/PFC configs.