Windows Server 2016 Data Deduplication Scales and Performs Better

I’ve been leveraging Windows Server Data Deduplication since it became available with great results.

Embedded image permalink

One of the enhanced features in Windows Server 2016 is Data Deduplication and it’s one I welcome very much. The improvements we’re getting mostly have to do with scale and performance. I’m quite pleased that Microsoft listened to our previous feedback on this.


You cannot imagine how much money on backup target storage we have saved by using this. So we’re very happy that Windows Server 2016 Data Deduplication scales and performs better. The fact that we can no get even better scale and performance is music to our ears. The Backup target servers are the first in line for an upgrade, that’s for sure! That’s the reason I mentioned it as a subject to look into in the Hyper-V amigos interview at Ignite!

Scale Improvement of the supported LUN sizes, up to 64TB

Actually I was already pushing this to 50TB Embarrassed smile in some cases for testing but over all I used 6 to 10 TB volumes. But the support for bigger volumes is very welcome. Now, please not that you should NOT go any higher than 64TB (I actually stay below that) otherwise deduplication doesn’t work due to it’s dependency on VSS. Please read my blog

Windows 2012 R2 Data Deduplication Leverages Shadow Copies: “LastOptimizationResultMessage : A volume shadow copy could not be created or was unexpectedly deleted” on this subject.

In Windows 2012 R2 we were limited because data deduplication used a single-threaded job and I/O queue for each volume. That makes it wiser to have 10 target LUNS of 60TB than one huge 60TB LUN. The big issue otherwise is that large volumes could lead to the dedup processing keeping up with the rate of data changes (“churn”).  Now your milage would very depending on the type of data and the delta. More info on this in the blog post:Sizing Volumes for Data Deduplication in Windows Server. It will help you size the volumes but note that in Windows Server 2016 the rules have changed Smile

The dedup optimization processing now runs multiple threads in parallel using multiple I/O queues on a single volume which gives you better performance and doesn’t incur the overhead of having to use more smaller LUNs.

File sizes up to 1TB are good for dedup

Windows Server 2012 R2 Data Deduplication supports the use of file sizes up to 1TB, but they are considered as “not good candidates” for dedup.  So that DPM workaround of backing up to a truckload of virtual machines with 1TB virtual disks that are deduplicated is borderline. You can see one improvement in CPS v2 coming already (also see the next header). 1TB is now fully supported and a good candidate. I’ll be pushing it higher … in my opinion this is were the most work will need to be done for future improvements. It would allow for more scenarios (I have VMs that hold VHDX virtual disks of  2TB or more). Scale it something that helps keep things simple. Simple avoid costs & issue with complexity. That’s always a good thing if possible.

In Windows Server 2012 the algorithms can’t scale as well and performance suffers due to things like scanning for and inserting changes can slow down as the total data set increases. These processes have been redesigned in Windows Server 2016. It now uses new stream map structures and improved partial file optimization. As a result 1TB file sizes have become good candidates.

Virtualized backup is a new usage type

DPM is already leveraging deduplication of virtual machines (CPS drove that I think, see Deduplicating DPM Storage).


In Windows Server 2016 all the dedup configuration settings have been combined into a new usage type called “Backup”. This simplifies the deployment and helps “future proof” your setup as future changes can automatically be applied true this usage type.

Nano Server support

Data deduplication is (or will be) fully supported in Nano Server (new in TPv3). It’s not completely done yet so deduplication support in Nano Server still has a few restrictions:

  • Support has only been validated in non-clustered configurations
  • Deduplication job cancellation must be done manually (using the Stop-DedupJob PowerShell command)

Microsoft welcomes any feedback on the deduplication feature via an email sent to For me the standing order is to break through that 1TB barrier!

My take & Magic Ball

In combination with the right backup product it saves a ton of money. I have leveraged VEEAM and in the past Windows Backup (inbox) with great results. The benefit of these two is that you can backup to physical storage and leverage deduplication. Virtualized backup as a new usage type and makes live easier for the supported “workaround” around the limitations of DPM where normally they only support VDI for  with deduplication.  What I’m really curious about is another possible future usage type: “Virtual Servers” … I guess for that one deduplication support for the OS disk would be very beneficial for “cloud” providers. We’ll see

Windows Deduplication And Mysterious Folder & File Sizes

There was a brief moment of “this can’t be good” the sys admin looked at the file size of the backup folders and compared it to the size reported for the files. Sure I had told him that Windows inbox deduplication rocked but this had to be too good to be true or deduplication had just eaten all the backup files and he was “toast”. It was neither. But that requires some explanation. The good news is that Windows Data Deduplication combined with a backup product that supports it like VEEAM will save you a ton of money on deduplication licenses some charge and storage costs.

This is what he saw, and what caused the raised eye brow. 12.4TB reduced to 285GB.


Deduplication can’t be that great, right? Did something go wrong? Checking the properties of ALL selected files themselves did not report anything else but compared to the volume info for used space something seems very wrong. That’s supposed to be 5.34 TB.


The volume properties report the effective spaces consumed on the volume, so that reflects the true deduplication results. You can confirm this with PowerShell

A savings rate of 57% and  5.34 TB of actually consumes space (5880575557632 bytes) and an unoptimized size of 12.4 TB.  Just as server manager reports.


So what is explorer up to at the folder and file level? Nothing, it just can’t show you the complete picture. Windows Data Deduplication stores duplicated chunks into the System Volume Information folder. Windows explorer runs under your account and has no access to that folder and doesn’t report the size of all chunks in there. The only thing it does reports are the non duplicated bits that are left in the source folder. In our case where the backups reside. The result is, as said, raised eyebrows.

The same is true for any other tool actually, like WinDirStat in the blow screenshot.


When we run this tools as system we get a different picture and you can navigate to the actual ChunkStore and learn more about the internals.


Windows 2012 R2 Data Deduplication Leverages Shadow Copies: “LastOptimizationResultMessage : A volume shadow copy could not be created or was unexpectedly deleted”.

When you’re investigation and planning large repositories for data (backups, archive, file servers, ISO/VHD stores, …) and you’d like to leverage Windows Data Deduplication you have too keep in mind that the maximum supported size for an NTFS volume is 64TB. They can be a lot bigger but that’s the maximum supported. Why, well they guarantee everything will perform & scale up to that size and all NTFS functionality will be available. Functionality on like volume shadow copies or snapshots. NFTS volumes can not be lager than 64TB or you cannot create a snapshot. And guess what data deduplication seems to depend on?

Here’s the output of Get-DedupeStatus for a > 150TB volume:


Note “LastOptimizationResultMessage      : A volume shadow copy could not be created or was unexpectedly deleted”.

Looking in the Deduplication even log we find more evidence of this.


Data Deduplication was unable to create or access the shadow copy for volumes mounted at "T:" ("0x80042306"). Possible causes include an improper Shadow Copy configuration, insufficient disk space, or extreme memory, I/O or CPU load of the system. To find out more information about the root cause for this error please consult the Application/System event log for other Deduplication service, VSS or VOLSNAP errors related with these volumes. Also, you might want to make sure that you can create shadow copies on these volumes by using the VSSADMIN command like this: VSSADMIN CREATE SHADOW /For=C:


   Creating shadow copy set.

   Running the deduplication job.


   Volume name: T: (\?Volume{4930c926-a1bf-4253-b5c7-4beac6f689e3})

Now there are multiple possible issues that might cause this but if you’ve got a serious amount of data to backup, please check the size of your LUN, especially if it’s larger then 64TB or flirting with that size. It’s temping I know, especially when you only focus on dedup efficiencies. But, you’ll never get any dedupe results on a > 64TB volume. Now you don’t get any warning for this when you configure deduplication. So if you don’t know this you can easily run into this issue. So next to making sure you have enough free space, CPU cycles and memory, keep the partitions you want to dedupe a reasonable size. I’m sticking to +/- 50TB max.

I have blogged before on the maximum supported LUN size and the fact that VSS can’t handle anything bigger that 64TB here Windows Server 2012 64TB Volumes And The New Check Disk Approach. So while you can create volumes of many hundreds of TB you’ll need a hardware provider that supports bigger LUNs if you need snapshots and the software needing these snapshots must be able to leverage that hardware VSS provider. For backups and data protection this is a common scenario. In case you ask, I’ve done a quick crazy test where I tried to leverage a hardware VSS provider in combination with Windows Server data deduplication. A LUN of 50TB worked just fine but I saw no usage of any hardware VSS provider here. Even if you have a hardware VSS provider, it’s not being used for data deduplication (not that I could establish with a quick test anyway) and to the best of my knowledge I don’t think it’s possible, as these have not exactly been written with this use case in mind. Comments on this are welcome, as I had no more time do dig in deeper.

Money Saving Hero of 2012: Windows 2012 In Box Deduplication Delivers Big Value

To wave goodbye to 2012 I’m posting the latest screenshot of the easiest and very effective money saving feature you got in Windows Server 2012 than RTM in August. Below you’ll find the status report of a backup LUN in a small environment.  Yes those are real numbers in a production environment.image

If you are not using it; you’re really throwing away vast amounts of money on storage right this moment. If you’re in the market for a practical, economical and effective backup solution my advice you to  is the following. Scrap any backup vendor or product that prevents it files of LUNs being duplicated  by Windows Server 2012.  They might as well be robbing you at gun point.

You can pay for a very nice company new years party with these savingsMartini glassParty smile

I wish you all a great end of 2012 and a magnificent 2013 ahead. In 2013 we’ll push Windows Server 2012 into service where we couldn’t before (waiting for 3 party vendor support and if they keep straggling they are out of the door) and work at making our infrastructure ever more resilient an protected.  With System Center SP1 some products of that suite will make a come back in our environment. 10Gbps is bound to become the standard all over our little data center network and not just our most important workloads.