Windows Server 2012 Deduplication Results In A Small Environment

Posted on November 13, 2012 by workinghardinit

There is a small environment that provides web presence and services. In total there a bout 20 production virtual machines. These are all backed up to a Transparent Failover File Share on a Windows Server 2012 cluster that is used to host all the infrastructure and management services.

The LUN/Volume for the backups is about 5.5 TB of storage is available. The folder layout is shown in the screenshot below. The backups are run “in guest” using native Windows Backup which has the WindowsImageBackup subfolder as target. Those backups are archived to an “Archives” folder. That archive folder is the one that gets deduplicated, as the WindowsImageBackup folder is excluded.

This means that basically the most recent version is not deduplicated guaranteeing the fastest possible restore times at the cost of some disk space. All older (> 1 day) backup files are deduplicated. We achieve the following with this approach:

It provides us with enough disk space savings to keep archived backups around for longer in case we need ‘m.
It also provides for enough storage to backup more virtual machines while still being able to maintain a satisfactory number of archived backups.
Ay combination of the above two benefits can be balanced versus the business needs
It’s a free, zero cost solution

The Results

About 20 virtual machines are backed up every week (small delta and lots of stateless applications).As the optimization runs we see the savings grow. That’s perfectly logical. The more backups we make of virtual machines with a small delta the more deduplication can shine. So let’s look at the results using Get-DedupStatus | fl

A couple of weeks later it looks like this.

Give it some more months, with more retained backusp, and I think we’ll keep this around 88%-90% .From tests we have done (ddpeval.exe) we think we’ll max out at around 80% savings rate. But it’s a bit less here overall because we excluded the most recent backups. Guess what, that’s good enough for us Winking smile . It beats buying extra storage of paying a wad of money for disk deduplication licenses from some backup vendor or appliance. Just using the build in deduplication mechanisms in Windows Server 2012 Server saved us a bunch of money.

The next step is to also convert the production Hyper-V cluster to Windows Server 2012 so we can do host based backups with the native Windows Backup that now supports Cluster Shared Volumes (another place where that 64TB VHDX size can come in handy as Windows backup now writes to VHDX).

Some interesting screen shots

The volume reports we’re using 3TB in data. So 2.4TB is free.

Looking at the backup folder you see 10.9TB of data stored on 1.99 TB of disk .

So the properties of the volume reports more disk space used that the actual folder containing the data. Let’s use WinDirStat to have a look.

So the above agrees with the volume properties. In the details of this volumes we again see about 2TB of consumed space.

Could it be that the volume might is reserving some space ensure proper functioning?

When you dive deeper things we get some cool view of storage space used.. Where Windows Explorer is aware of deduplication and shows the non deduplicates size for the vhd file, WinDirStat does not, it always shows shows the size on disk, which is a whole lot less.

This is the same as when you ask for the properties of a file in Windows Explorer.

Discussion

Is it the best solution for everyone? Not always no. The deduplication is done on the target after the data is copied there. So in environments where bandwidth is seriously constrained and there is absolutely no technical and/or economical way to provide the needed throughput this might not be viable solution. But don’t dismiss this option to fast. In a lot of scenarios is it is very good and cost effective feature. Technically & functionally it might be wiser to do it on the target as you don’t consumes to much memory (deduplication is a memory hog) an CPU cycles on the source hosts. Also nice is that these dedupe files are portable across systems. VEEAM has demonstrated some nice examples of combing their deduplication with Windows dedupe by the way. So this might also be an interesting scenario.

Financially the the cost of deduplication functionality with hardware appliances or backup software hurts like the kick of a horse straight onto the head. So even if you have to invest a little in bandwidth and cabling you might be a lot better of. Perhaps, as you’re replacing older switches by new 1Gbps or 10Gbps gear, you might be able to recuperate the old ones as dedicated backup switches. We’re using mostly recuperated switch ports and native Windows NIC teaming, it works brilliantly. I’ve said this before, saving money whilst improving operations rarely gets you fired. The sweet thing about this that this is achieved by building good & reliable solutions, which means they are efficient even if it costs some money to achieve. Some managers focus way to much on efficiency from the start as to them means nothing more than a euphemism for saving every € possible. Penny wise and Pound foolish. Bad move. Efficiency, unless it is the goal itself, is a side effect of a well designed and optimized solution. A very nice and welcome one for that matter, but it’s not the end all be all of a solution or you’ll have the wrong outcome.

Hyper-V Guest Storage Performance: Above & Beyond 1 Million IOPS

Posted on November 2, 2012 by workinghardinit

Making a million IOPS possible in a Windows Server 2012 Hyper-V VM

A lot of you will have seen the demos of a Hyper-V guest with VHDX disks running on Windows Server 2012 doing a million apps, if you haven’t yet, take a look here. While some quickly dismissed this as “irrelevant boasting” without real life relevance, I respectfully disagree. This is smart future proofing by Microsoft and provides a hypervisor ready for the future hardware capabilities and capable to handle the most demanding workloads today & in the years to come. Sure such a demo is under lab/ideal conditions and does not reflect the majority of real life environments but it’s nice to see what a hypervisor is capable of if and when you might need it. Remember there was a day that 4GB was a lot of RAM and 2TB sounded gigantic. Also remember that some people have larger needs than others. Until Windows Server 2008 R2 you had some limitations in storage IO performance that would not allow for a million IOPS. These had to be addressed or all the efforts with regards to capabilities and performance in regard to storage, CPU, networking and memory would just hit those particular bottlenecks. So it is addressing real needs and indeed also smart future proofing.

Capabilities of virtual machine storage IO throughput in Windows 2008 R2

The capabilities listed below dictate the IO capabilities in virtual machines running on Windows Server 2008 R2 Hyper-V:

Limited to one IO channel per virtual SCSI Controller
256 queue depth/SCSI for all devices attached to that SCSI adapter.
There was one fixed vCPU (0) dedicated to handling IO.

The picture above illustrates these limits. You see two virtual SCSI Controllers each having 2 VHD virtual disks attached. Each disk shares the one channel the controller it is attached to has.

These limits could become a bottle neck but that was never was too big of a problem with a maximum of 4 vCPUs in Windows 2008 R2 Hyper-V. If needed for performance we might have attached VHDs to different virtual SCSI controllers for the best possible performance in Windows Server 2008 R2 Hyper-V .

With 64 vCPUs and ever more demanding workloads these limitations would become a (serious) issue so this needed to be addressed. If not, despite all other efforts in regards to the 4 big resources (memory, storage and network) in Windows 2012, this would remain the limiting factor of IOPS inside a virtual machine on Windows 2012.

Windows Server 2012 improvements to virtual machine storage IO scaling

The picture above illustrates the improvements in Windows Server 2012 Hyper-V IO Scaling:

There is now 1 channel per 16 vCPUs, per virtual SCSI device, per controller. So that means you have 4 channels, per VHDX attached to a virtual SCSI Controller when you have 64 vCPUs in the virtual machine. Compared to before, this is a significant improvement and a much needed one with the 64 vCPUs capability there is now.
IO interrupt handling is now distributed amongst all vCPUs and this process is NUMA aware. This is a huge improvement!
There is now a 256 queue depth/device attached to a specific SCSI adapter. That’s another big improvement.

That people, is how you get a virtual machine to handle a million IOPS. Nice! The questions or doubts whether Hyper-V can deliver the capacity, throughput & performance have been wiped of the table, yes also for virtual storage IOPS. You can now go straight to how it will address your business needs. From my experience it does so brilliantly and very cost effectively. Life might not be perfect but it is very good Smile

Windows Server 2012 64TB Volumes And The New Check Disk Approach

Posted on October 22, 2012 by workinghardinit

Introduction

I a previous post I mentioned the use 64TB volumes in Windows Server 2012 in a supported scenario. That’s a lot of storage and data. There’s a cost side to this all and it also incurs some risk to have all that data on one volume. Windows 2012 tries to address the cost issue with commodity storage in combination with the excellent resilience of storage space to reduce both cost and risk. Apart from introducing ReFS they also did some work on NFTS to help with reliability. We already discussed the use of the flush command in Windows Server 2012 64TB NTFS Volumes and the Flush Command. Now we’ll look at the new approach for detecting and repairing corruptions in NTFS which optimizes uptime through on line repair and keeps off line repairs minimized and very short thanks to spot fixing.

On top of these improvements studying this process taught me two very interesting things:

The snapshot size limit is also a reason why NFTS volumes are not bigger than 64TB. See the explanation below!
Cluster Shared Volumes an CSVFS enable continuous availability even when spot fix is run! See below for more details.

So read on and find out why I’m not worried about the 50TB & 37TB LUNs we use for Disk2Disk backups.

Hitting the practical boundaries of check disk (CHKDSK)

While NTFS has been able to handle volumes up to 256TB in size, this was never used in real life due t the fact that most people don’t have that amount of storage available (or need to have) and that the supported limited was 16TB. With Windows 2012 this has become 64TB. That’s just about future proof enough for the time being I’d say Winking smile . In real life the practical volume size has been smaller than this die to a number of reasons.

There is the limitation of basic disks which are solved with GPT, but that has it’s own requirements. The there are the storage arrays on which the biggest LUN you can create varies from 2TB tot 16, 50TB or more depending on the type, brand and model. Another big concern was based on potentially long CHKDSK execution time. No that the volumes size is the factor here, it’s the number of files on the volume that dictates how long CHKDSK will run. But volume size and number of files very often go hand I hand.

While Microsoft has been reducing the execution time of with every windows release since Windows 2000 the limit of additional improvements that could be made with the current approach have reach a practical limit. Faced with ever bigger volumes, a huge number of files and ever more “Always On” services, requiring very high to continuous availability, things needed to change drastically.

A vastly improved check disk (CHKDSK) for a new era

This change came through a new approach for detecting and repairing corruptions in NTFS that consists of:

Enhanced detection and handling of corruptions in NTFS via on-line repair
Change the CHKDSK execution model to separate analysis and repair phases
File system health monitored via Action Center and Server Manager

Enhanced NTFS Corruption Handling

NTFS now logs information on the nature of a detected corruption that cannot be repaired on line. This is maintained in new metadata files:

$Verify
$Corrupt

The new “Verification” component confirms the validity of a detected corruption to eliminated unnecessary CHKDSK runs due to a transient hiccup. There’s a service involved here call “Spot Verifier”:

The on-line repair capability that was introduced with the “Self-healing” feature in Vista and was limited to Master File Table (MFT) related corruptions has been greatly enhanced and extended. That means it can now handle a broader range of corruptions across multiple metadata files which means nearly all of the most common corruptions can be fixed by an on-line repair

The New CHKDSK Process & Phases

The phases are:

The analysis phase is performed online on a volume snapshot, so there is no down time for the services and users.

IMPORTANT NOTE: You read that right! The analysis phase is performed online on a volume snapshot. Now when you know that the maximum supported size of a Windows volume snapshot is 64TB you also know that except for stress & performance testing of 256TB LUNS there is another limitation in play. The size of the snapshot to make the new chkdsk process work! If you have volumes bigger than 64Tb, this process can and will use a hardware snapshot if there is a hardware VSS Provider that supports snapshots bigger than 64 TB. So the this new chkdsk process in Windows Server 2012 will also work for volumes bigger than 64TB. But within the Microsoft Windows Server 2012 stack, 64TB is the top limit or you lose this new chkdsk functionality. Interesting stuff!

If a corruption is detected, there will be a first attempt at Online Self-Healing via the self-healing API. Now if self-healing cannot repair the error the Online Verification “‘(Spot Verification” kicks in to verify that the error is not a glitch. When it is verified that any detected corruption that cannot be fixed on line is identified and logged to a new NTFS metadata file: $Corrupt. After this the, the administrators are notified so at a time of their choosing the volume can be taken offline to do the repairs

The Offline repair phase (spot fixing) only runs when all else has failed and can’t be avoided. The volume can be taken offline, either manually or scheduled, at a time the administrator’s chooses. Spot fix only repairs logged corruptions to minimize volume unavailability.

Cluster Shared Volumes bring us Continuous Availability in Windows Server 2012 as the process leverages clustering and CSVFS functionality in this case to make sure you don’t have to bring the volume down, IO is just temporarily stalled:

Scan runs & adds detected corruptions to $Corrupt
Every minute the cluser IsAlive check runs on a cluster which also ….
Enumerates $corrupt system file to identify corrupt files via fsutil, if so action is taken
CSV namespace is paused to CSVFS & Underlying NTFS is dismounted
Spotfix runs for maximal 15 seconds, if needed this repeats every 3 minutes
It corruption repair will take too long it will be marked to run at off line moment and not do the above.

It normally takes no longer than a few seconds, often a lot less, to repair corruptions off line now, which is benign on a modern physical server that runs through its memory configuration, BIOS/EUFI, Controller. Even on laptops and virtual machines this is very short and doesn’t really add much to the boot time as you can see in the picture below, it’s often not even noticeable.

Using this new functionality

The user is notified via the Windows User Interface. The phases of repair are also displayed in the Action Center & Server Manager and the user can take appropriate action.

The chkdsk command line has had options added that leverage this model

The fsutil repair command has also some new options added:

You can also control the action via PowerShell with the storage cmdlet Repair-Volume. Acton can be run as a job and the parameters -scan, -spotfix, -offlinescanandfix are pretty obvious by now. See http://technet.microsoft.com/en-us/library/hh848662.aspx

Windows Server 2012 64TB NTFS Volumes and the Flush Command

Posted on October 12, 2012 by workinghardinit

As you might very well have read or even tried you can use 64TB volumes in Windows Server 2012 in a supported scenario. You can do more, NTFS is quite capable of this. I created a 300TB LUN once that I could format up to 256TB Smile But as no one can realistically stress test this for real, it’s not supported.

That’s a lot of storage and data. It’s also expensive and incurs some risk … all that data on one volume. Windows 2012 tries to address the cost issue with commodity storage in combination with the excellent resilience of storage space to reduce both cost and risk.

Apart from introducing ReFS they also did some work on NFTS to help with reliability:

A new approach for detecting and repairing corruptions in NTFS which optimizes uptime through on line repair and with spot fixing that keeps off line repairs minimized and very short.
Using the flush command instead of FUA.

In this post this we’ll focus on the flush command.

Flushing Your Data

No, not that kind of flushing Smile You have always been able to “throw” data away with some very bad practices and unreliable technology, no need for much innovation there.

I’m talking about the fact that NTFS in Windows Server 2012 has switched to the flush command instead of relying on Forced Unit Access (FUA) to increase reliability for SATA disk and performance with SCSI disks. The good news is you don’t lose anything and gain on both fronts. Especially making cheaper SATA disks more reliable is a big one. It allows SATA disks to be used in business/enterprise scenarios and as such helps reduce costs.

What is Forced Unit Access (FUA)?

Well it’s a flag that indicates a given write should go directly to media, writing through a devices write cache. The NTFS Journaling File System uses FUA to guarantee write ordering which is important to maintain its metadata integrity. It was implemented in the SCSI (T10) specification but not in the original ATA (T13) specification. This was added in the 2002 version of the ATA specs but FUA has never been guaranteed to implemented on all ATA devices and as such Windows could not rely on it being there with ATA/SATA disks. As a result it was never used by Windows with SATA disks.

That meant that with SATA disks there is a bigger change of corruption due to a power failure or the likes as NTFS was designed to rely on FUA implementation for robust metadata writes.With ever increasing capacity needs an larger SATA disk being needed and used for business purposes something had to be done. So with Windows Server 2012 (and Windows 8) NTFS switched to using a flush command to the drives write cache instead of using FUA.

The Benefits

The switch to using the flush command for all operations that require write ordering to ensure file system metadata integrity realizes better reliability and robustness when using commodity SATA storage as it reduces possibility of corruption due to power loss
It Improves performance on SCSI devices because it allows the disk to cache data for as long as safely possible instead of having to do write-through using FUA

Working Hard In IT

My view on IT from the trenches

Category Archives: Storage