Lightning Fast Fixed VHDX File Creation Speed With ReFS on Windows Server 2016

In this blog post we’re going to take a quick look at the lightning fast fixed VHDX file creation speed with ReFS on Windows Server 2016. We’ll compare it to creating fixed VHDX files On NTFS with a SAN that supports ODX. Both the NTFS and the CSV volume are CSV disk in a Hyper-V cluster and the test is run on the same node. The ODX cabale SAN is a Dell Compellent with Storage Center 6.5.20.

We create on  a selection of fixed VHDX files sizes (50GB, 100GB, 500GB and 1TB) on NTFS volume Windows Server 2016 host You can see the quite excellent results in file creation speeds with ODX.


These results are very good (DELL Compellent always did a great job implementing ODX) and the time to create a 1TB fixed VHDX is just over 5 seconds consistently. Impressive by any standard I would say! When we start using CSVs we can see that times double for the larger VHDX sizes but still +/- 12seconds for a 1TB disk is impressive by any standard. There is little difference whether the node where the script runs owns the CSV or not.



Can things be more impressive? Let’s do the same exercise on a ReFS volume on a Windows Server 2016 host. Same server, same SAN with ODX enabled but note that ReFS does not even support ODX, so it cannot be leveraged.


No matter what the file size of our fixed VHDX files they are created in just over 1 second consistently. This is very impressive.


When we use a CVS LUN we still see the same impressive results. On CSV LUNS not owned by the node where we execute the test script we see a creation time of 2 seconds for VHDX sizes of 1TB. Still lightning fast.

If you do not have a SAN that supports ODX you can see why ReFS might become a very attractive choice for the file system for your Hyper-V virtual machine data volumes in Windows Server 2016. I can see why they mentioned it as the preferred option for Storage Spaces Direct. Do note that ReFS does not support deduplication and/or UNMAP (I see no dedupe support yet for virtual server workloads on the horizon either yet?). If you move large amounts of data around ODX does provide significant assistance with this. So with ReFS go for a large SSD tier. Flash only without deduplication or erasure coding might be cost prohibitive I’m afraid.

But let this not put you off ReFS. It has many benefits in combination with storage spaces and these new VHDX operation capabilities just add to that. So for many environments with commodity based storage this has become an even more interesting choice.

More Tips On Dealing With Removing Short File Names When Migrating To a SMB3 Transparent Failover File Server Cluster

You might have read my blog posts on the capabilities and the process of migrating to a Transparent Failover File Server. If not, here they are:

These are a good read with some advice from real world experience and in this post I’ll offer some more tips. I’ve discussed the need to disable and get rid of short file names in my blog and offered other tips to prepare for your migration and get your file share LUNs in tip top, modern shape. But what if you run into short file name issues where you can seem to get rid of them?

Well here’s 3 more things to check:

1) Get rid of the shadow copies used for Previous Versions

The reason you’d better get rid of them is that they can also contain short files names & way to long path or file names. We don’t want them to ruin the party so we remove them all by disabling shadow copies on the LUNs to be copied. We can enable them again once the LUN is up and running in the new file cluster.

2) The logs indicate there are short file names you don’t have access to

If the NFTS permissions on the folder & file structure are OK you should not have to much problems bar some files being locked by being in use. Rerunning the fsutil command prior to migrating with the server service stopped will prevent any connectivity and use of file shares by people ignoring the request to log of or shut down their clients or automated jobs that otherwise keep accessing them.

But you might still get some indications in the log file(s) that state you can remove certain file names.


There is the good old trick of running your command under SYSTEM. That those the job! That helps get rid of short file name instances of folders where you normally don’t get access to. If system has rights you’ll be fine whether it’s a system folder or not.To do this the Sysinternals tools come in handy once again. You can launch a command prompt running under the NT AUTHORITYSYSTEM account using psexec.exe by running the following from a elevated command prompt:

psexec -i -s cmd.exe or psexec  -s cmd.exe


The-s switch runs the remote process in the System account. Psexec temporarily installs a service "psexec running psexesvc.exe" on the remote computer (or locally if that’s what you doing) which is removed when the app or process that’s running is closed. It’s obvious now I hope why you need an elevated command prompt to run this command.

Now should you do this by default? Nope. Just when you need to and as always have a realistic backup plan, a way to recover when things go south.

3) Anti virus sometime prevents the removal of short file names

Disable Anti-Virus, sometimes it holds a temporary entry in the registry for the file involved. At least that’s what I’ve seen as a transient issue in some of the large number of logs I gathered. Yeah, I ran a lot of fsutil against large NTFS volumes. What can I say. Due diligence pays off!

4) Run ChkDsk

Just make sure the volume is healthy and no repairs are needed. If your migrating from and older file server there might be outstanding issues and a check disk on volumes with lot’s of files take time. Some of the ones I’ve dealt with had more that 2 million files on a 2TB LUN and it it can take 24 hours. Fun when you have 10 LUNs :-/

NTFS Permissions On A File Server From Hell Saved By SetACL.exe & SetACL Studio

Most IT people don’t have a warm and fuzzy feeling when NTFS permissions & “ACLing” are being discussed. While you can do great & very functional things with it, in reality when dealing with file servers over time “stuff” happens. Some of it technical, most of it is what I’ll call “real life”. When it comes to file servers, real life, especially in a business environment, has very little respect, let alone consideration for NFTS/ACL best practices. So we all end up dealing with the fall out of this phenomena. If you haven’t I could state you’re not a real sys admin but in reality I’m just envious of your avoidance skills Smile.

You don’t want to fight NTFS/ACLs, but if it can’t be avoided you need the best possible knowledge about how it works and the best possible tools to get the job done (in that order).

If you have not heard of SetACL or DelProf2, you might also not have heard of uberAgent for Splunk, let alone of their creator, community rock star Helge Klein. If you new to the business I’ll forgive you but if you been around for a while you have to get to know these tools. His admin tools, both the free or the paying ones, are rock solid and come in extremely handy in day to day work. When the shit hits the fans they are priceless.

Helge is an extremely knowledgeable, experienced, talented and creative IT Professional and developer. I’ve met him a couple of times (E2EVC, where he’s an appreciated speaker) and all I can say is that on top of all that, he’s a great guy, with heart for the community.

Having the free SetACL.exe available for scripting of NTFS permissions is a luxury I cannot do without anymore. On top of that for a very low price you can buy SetACL Studio. This must be the most efficient GUI tool for managing NFTS permissions / ACLs I have ever come across.

Not long ago I was faced with a MBR to GPT LUN migration on a very large file server. It’s the proverbial file server from hell. We’ve all been there too many times and even after 15 years plus we still cannot get people to listen and follow some best practices and above all the KISS principle. So you end up having to deal with the fall out of every political, organizational, process and technical mistake you can imagine when it comes to ACLs & NTFS permissions. So what did I reach for? SetACL.exe and SetACL Studio, these are my go to tools for this.


Check out the web page to read up on what this tool can do for you. It very easy to use, intuitive and fast. It can do ACL on file systems, registry, services, printers and even WMI. It helps you deal with granting ownership and rights without messing up the existing NTFS permissions in an easy way. It works on both local and remote systems. Last but not least it has an undo function, how cool is that?!  Yup and admin tool that let you change your mind. Quite unique.

As an MVP I can get a license for free form Helge Klein but I recommend any IT Pro or consultant to buy this tool as it makes a wonderful addition to anyone’s toolkit, saving countless of hours, perhaps even days. It pays itself back within the 15 minutes you use it.

Other useful tools in your toolkit are as it can handle the large (550-800 MB) log files RoboCopy can produce and some PowerShell scripting skills to parse these files.

Windows Server 2012 64TB Volumes And The New Check Disk Approach


I a previous post I mentioned the use 64TB volumes in Windows Server 2012 in a supported scenario. That’s a lot of storage and data. There’s a cost side to this all and it also incurs some risk to have all that data on one volume. Windows 2012 tries to address the cost issue with commodity storage in combination with the excellent resilience of storage space to reduce both cost and risk. Apart from introducing ReFS they also did some work on NFTS to help with reliability. We already discussed the use of the flush command in Windows Server 2012 64TB NTFS Volumes and the Flush Command.  Now we’ll look at the new approach for detecting and repairing corruptions in NTFS which optimizes uptime through on line repair and keeps off line repairs minimized and very short thanks to spot fixing.

On top of these improvements studying this process taught me two very interesting things:

  1. The snapshot size limit is also a reason why NFTS volumes are not bigger than 64TB. See the explanation below!
  2. Cluster Shared Volumes an CSVFS enable continuous availability even when spot fix is run! See below for more details.

So read on and find out why I’m not worried about the 50TB & 37TB LUNs we use for Disk2Disk backups.

Hitting the practical boundaries of check disk (CHKDSK)

While NTFS has been able to handle volumes up to 256TB in size, this was never used in real life due t the fact that most people don’t have that amount of storage available (or need to have) and that the supported limited was 16TB. With Windows 2012 this has become 64TB. That’s just about future proof enough for the time being I’d say Winking smile. In real life the practical volume size has been smaller than this die to a number of reasons.

There is the limitation of basic disks which are solved with GPT, but that has it’s own requirements. The there are the storage arrays on which the biggest LUN you can create varies from 2TB tot 16, 50TB or more depending on the type, brand and model. Another big concern was based on potentially long CHKDSK execution time. No that the volumes size is the factor here, it’s the number of files on the volume that dictates how long CHKDSK will run. But volume size and number of files very often go hand I hand.

While Microsoft has been reducing the execution time of with every windows release since Windows 2000 the limit of additional improvements that could be made with the current approach have reach a practical limit. Faced with ever bigger volumes, a huge number of files and ever more “Always On” services, requiring very high to continuous availability, things needed to change drastically.

A vastly improved check disk (CHKDSK) for a new era

This change came through a new approach for detecting and repairing corruptions in NTFS that consists of:

  1. Enhanced detection and handling of corruptions in NTFS via on-line repair
  2. Change the CHKDSK execution model  to separate analysis and repair phases
  3. File system health monitored via Action Center and Server Manager

Enhanced NTFS Corruption Handling

NTFS now logs information on the nature of a detected corruption that cannot be repaired on line. This is maintained in new metadata files:

  • $Verify
  • $Corrupt

The new “Verification” component confirms the validity of a detected corruption to eliminated unnecessary CHKDSK runs due to a transient hiccup. There’s a service involved here call “Spot Verifier”:


The on-line repair capability that was introduced with the “Self-healing” feature in Vista and was limited to Master File Table (MFT) related corruptions has been greatly enhanced and extended. That means it can now handle a broader range of corruptions across multiple metadata files which means nearly all of the most common corruptions can be fixed by an on-line repair

The New CHKDSK Process & Phases

The phases are:

The analysis phase is performed online on a volume snapshot, so there is no down time for the services and users.

IMPORTANT NOTE: You read that right! The analysis phase is performed online on a volume snapshot. Now when you know that the maximum supported size of a Windows volume snapshot is 64TB you also know that except for stress & performance testing of 256TB LUNS there is another limitation in play. The size of the snapshot to make the new chkdsk process work! If you have volumes bigger than 64Tb, this process can and will use a hardware snapshot if there is a hardware VSS Provider that supports snapshots bigger than 64 TB. So the this new chkdsk process in Windows Server 2012 will also work for volumes bigger than 64TB. But within the Microsoft Windows Server 2012 stack, 64TB is the top limit or you lose this new chkdsk functionality. Interesting stuff!

If a corruption is detected, there will be a first attempt at Online Self-Healing via the self-healing API. Now if self-healing cannot repair the error the Online Verification “‘(Spot Verification” kicks in to verify that the error is not a glitch. When it is verified that any detected corruption that cannot be fixed on line is identified and logged to a new NTFS metadata file: $Corrupt. After this the, the administrators are notified so at a time of their choosing the volume can be taken offline to do the repairs


The Offline repair phase (spot fixing) only runs when all else has failed and can’t be avoided. The volume can be taken offline, either manually or scheduled, at a time the administrator’s chooses. Spot fix only repairs logged corruptions to minimize volume unavailability.

Cluster Shared Volumes bring us Continuous Availability in Windows Server 2012 as the process leverages clustering and CSVFS functionality in this case to make sure you don’t have to bring the volume down, IO is just temporarily stalled:

  • Scan runs & adds detected corruptions to $Corrupt
  • Every minute the cluser IsAlive  check runs on a cluster which also ….
  • Enumerates $corrupt system file to identify corrupt files via fsutil, if so action is taken
  • CSV namespace is paused to CSVFS & Underlying NTFS is dismounted
  • Spotfix runs for maximal 15 seconds, if needed this repeats every 3 minutes
  • It corruption repair will take too long it will be marked to run at off line moment and not do the above.

It normally takes no longer than a few seconds, often a lot less, to repair corruptions off line now, which is benign on a modern physical server that runs through its memory configuration, BIOS/EUFI, Controller. Even on laptops and virtual machines this is very short and doesn’t really add much to the boot time as you can see in the picture below, it’s often not even noticeable.


Using this new functionality

The user is notified via the Windows User Interface. The phases of repair are also displayed in the Action Center & Server Manager and the user can take appropriate action.

The chkdsk command line has had options added that leverage this model


The fsutil repair command has also some new options added:


You can also control the action via PowerShell with the storage cmdlet Repair-Volume. Acton can be run as a job and the parameters -scan, -spotfix, -offlinescanandfix are pretty obvious by now. See