Windows 2012 R2 Data Deduplication Leverages Shadow Copies: “LastOptimizationResultMessage : A volume shadow copy could not be created or was unexpectedly deleted”.

When you’re investigation and planning large repositories for data (backups, archive, file servers, ISO/VHD stores, …) and you’d like to leverage Windows Data Deduplication you have too keep in mind that the maximum supported size for an NTFS volume is 64TB. They can be a lot bigger but that’s the maximum supported. Why, well they guarantee everything will perform & scale up to that size and all NTFS functionality will be available. Functionality on like volume shadow copies or snapshots. NFTS volumes can not be lager than 64TB or you cannot create a snapshot. And guess what data deduplication seems to depend on?

Here’s the output of Get-DedupeStatus for a > 150TB volume:

image

Note “LastOptimizationResultMessage      : A volume shadow copy could not be created or was unexpectedly deleted”.

Looking in the Deduplication even log we find more evidence of this.

image

Data Deduplication was unable to create or access the shadow copy for volumes mounted at "T:" ("0x80042306"). Possible causes include an improper Shadow Copy configuration, insufficient disk space, or extreme memory, I/O or CPU load of the system. To find out more information about the root cause for this error please consult the Application/System event log for other Deduplication service, VSS or VOLSNAP errors related with these volumes. Also, you might want to make sure that you can create shadow copies on these volumes by using the VSSADMIN command like this: VSSADMIN CREATE SHADOW /For=C:

Operation:

   Creating shadow copy set.

   Running the deduplication job.

Context:

   Volume name: T: (\?Volume{4930c926-a1bf-4253-b5c7-4beac6f689e3})

Now there are multiple possible issues that might cause this but if you’ve got a serious amount of data to backup, please check the size of your LUN, especially if it’s larger then 64TB or flirting with that size. It’s temping I know, especially when you only focus on dedup efficiencies. But, you’ll never get any dedupe results on a > 64TB volume. Now you don’t get any warning for this when you configure deduplication. So if you don’t know this you can easily run into this issue. So next to making sure you have enough free space, CPU cycles and memory, keep the partitions you want to dedupe a reasonable size. I’m sticking to +/- 50TB max.

I have blogged before on the maximum supported LUN size and the fact that VSS can’t handle anything bigger that 64TB here Windows Server 2012 64TB Volumes And The New Check Disk Approach. So while you can create volumes of many hundreds of TB you’ll need a hardware provider that supports bigger LUNs if you need snapshots and the software needing these snapshots must be able to leverage that hardware VSS provider. For backups and data protection this is a common scenario. In case you ask, I’ve done a quick crazy test where I tried to leverage a hardware VSS provider in combination with Windows Server data deduplication. A LUN of 50TB worked just fine but I saw no usage of any hardware VSS provider here. Even if you have a hardware VSS provider, it’s not being used for data deduplication (not that I could establish with a quick test anyway) and to the best of my knowledge I don’t think it’s possible, as these have not exactly been written with this use case in mind. Comments on this are welcome, as I had no more time do dig in deeper.

Great Hardware Support Equals Fast Windows 2012 R2 Implementation

I love it when a plan comes together

We adopted Windows 2012 when right after it went RTM in august 2012. Today we’re are already running Windows 2012 R2 and ready to step up the pace. If you are a VAR/ISV that does not have fast & good support for Windows Server 2012 R2 consider this your notice. You can’t lead from behind. Get your act together and take an example from Altaro. Small, sure, but good & fast. How do we get our act together so fast? Fast? Yes, but it does take time and effort.

As it turns out, we’re pretty well of with the DELL hardware stack. The generation 11 and 12 servers are supported by DELL and on the Windows Server Catalog for Windows Server 2012 R2.

image

For more information on Dell Server inbox driver support see: Windows Server 2012 R2 RTM Inbox Driver Support on Dell PowerEdge Servers. By the way I can testify that we’ve run Windows Sever 2012 R2 successfully on 9th Generation hardware (PowerEdge 1950/2950).

We’ve been running tests since Windows 2012 R2 Preview on R710/R720 and it has been a blast. We’ve kept them up to date with the latest firmware & drives via SUU. And for our Intel X520 and Mellanox ConnectX-3 we’ve had rapid support as well.

So what more could you want? Well support from your storage array vendor I would think. I’m happy to report that Storage Center 6.4 has been out since October 8th and it supports Windows Server 2012 R2. Dell Compellent Adds MLC SSD Tier – Bests 15K HDDs on Price and Performance. Mind you on a lazy Sunday afternoon 2 quick e-mails to CoPilot got me the answer that Storage Center 6.3.10 also supports Windows Server 2012 R2.  Sweet!

And that’s not just DELL, the Dell Compellent Storage Center 6.4 is fully Windows Server 2012 R2 logo certified! That’s what you want to see from you vendor. Fast & excellent support.

image

Here’s the entire DELL hardware line up with Windows Server 2012 R2 support. Happy upgrading & implementation! If you have Software Assurance you’re set to reap the benefits of that investment today!

To my all employers / clients, you see now, told you so. Now, I have a thing in common with Col. Hannibal Smith, I love it when a plan comes together Open-mouthed smile.

image

I know some of you think that all the testing, breaking, wrecking of Preview bits, RTM & GA versions we do looks like chaos. Especially when you visually add the test server & switch configurations. But that’s what it looks like to YOU. To the initiated this is well executed plan, dropping all assumptions, to establish what works & will hold up. The result is that we’re ready today and by extension, so are you.

Preventing Live Migration Over SMB Starving CSV Traffic in Windows Server 2012 R2 with Set-SmbBandwidthLimit

One of the big changes in Windows Server 2012 R2 is that all types of Live Migration can now leverage SMB 3.0 if the right conditions are met. That means that Multichannel & SMB Direct (RDMA) come in to play more often and simultaneously. Shared Nothing Live Migration & certain forms of Storage Live Migration are often a lot more planned due to their nature. So one can mitigate the risk by planning.  Good old standard Live Migration of virtual machines however is often less planned. It can be done via Cluster Aware Updating, to evacuate a host for hardware maintenance, via Dynamic optimization. This means it’s often automated as well. As we have demonstrated many times Live Migration can (easily) fill 20Gbps of bandwidth. If you are sharing 2*10Gbps NICs for multiple purposes like CSV, LM, etc. Quality of Service (QoS) comes in to play. There are many ways to achieve this but in our example here I’ll be using DCB  for SMB Direct with RoCE.

New-NetQosPolicy “CSV” –NetDirectPortMatchCondition 445 -PriorityValue8021Action 4
Enable-NetQosFlowControl –Priority 4
New-NetQoSTrafficClass "CSV" -Priority 4 -Algorithm ETS -Bandwidth 40
Enable-NetAdapterQos –InterfaceAlias SLOT41-CSV1+LM2
Enable-NetAdapterQos –InterfaceAlias SLOT42-LM1+CSV2
Set-NetQosDcbxSetting –willing $False

Now as you can see I leverage 2*10Gbps NIC, non teamed as I want RDMA. I have Failover/redundancy/bandwidth aggregation thanks to SMB 3.0. This works like a charm. But when leveraging Live Migration over SMB in Windows Server 2012 R2 we note that the LM traffic also goes over port 445 and as such is dealt with by the same QoS policy on the server & in the switches (DCB/PFC/ETS). So when both CSV & LM are going one how does one prevent LM form starving CSV traffic for example? Especially in Scale Out File Server Scenario’s this could be a real issue.

The Solution

To prevent LM traffic & CSV traffic from hogging all the SMB bandwidth ruining the SOFS party in R2 Microsoft introduced some new capabilities in Windows Server 2012 R2. In the SMBShare module you’ll find:

  • Set-SmbBandwidthLimit
  • Get-SmbBandwidthLimit
  • Remove-SmbBandwidthLimit

image

To use this you’ll need to install the Feature called SMB Bandwidth Limit via Server Manager or using PowerShell:  Add-WindowsFeature FS-SMBBW

You can limit SMB bandwidth for Virtual machine (Storage IO to a SOFS), Live Migration & Default (all the rest).  In the below example we set it to 8Gbps maximum.

Set-SmbBandwidthLimit -Category LiveMigration -BytesPerSecond 1000MB

So there you go, we can prevent Live Migration from hogging all the bandwidth. Jose Baretto mentions this capability on his recent blog post on Windows Server 2012 R2 Storage: Step-by-step with Storage Spaces, SMB Scale-Out and Shared VHDX (Virtual). But what about Fibre Channel or iSCSI environments?  It might not be the total killer there as in SOFS scenario but still. As it turns out the Set-SmbBandwidthLimit also works in those scenarios. I was put on the wrong track by thinking it was only for SOFS scenarios but my fellow MVP Carsten Rachfahl kindly reminded me of my own mantra “Trust but verify” and as a result, I can confirm it even works to cap off Live Migration traffic over SMB that leverages RDMA (RoCE). So don’t let the PowerShell module name (SMBShare) fool you, it’s about all SMB traffic within the categories.

So without limit LM can use all bandwidth (2*10Gbps)

image

With Set-SmbBandwidthLimit -Category LiveMigration -BytesPerSecond 1250MB you can see we max out at 10Gbps (2*5Gbps).
image

Some Remarks

I’d love to see a minimum bandwidth implementation of this (that could include safety buffer for spikes in CSV traffic with SOFS). The hard cap limit might lead to some wasted bandwidth. In other scenarios you could still get into trouble. What if you have 2*10Gbps available but one of those dies on you and you capped Live Migration Traffic at 16Gbps. With one NIC gone you’re potentially in trouble until the NIC has been replaced. OK, this is not a daily occurrence & depending on you environment & setup this is less or more of a potential issue.