Quest Technical Experts Conference 2012 Europe Podcast

As the readers of my blog know I was in Barcelona this week to attend the Technical Experts Conference Europe 2012 organized by Quest (now a part of DELL).  Together with my fellow MVPs, friends and colleagues  Aidan Finn (@joe_elway), Carsten Rachfahl (@hypervserver) and Hans Vredevoort (@hvredevoort) we presented 8 sessions at the Virtualization Track and did a Hyper-V Experts panel to share what we have learned and help answer questions attendees on the new capabilities of Hyper-V in Windows Sever 2012. It was both fun and interesting to do. We learned some more from each other and also from the questions of an alert audience whom we enjoyed presenting to.

Mattias Sundling, a Technical evangelist at Quest and the owner of the Virtualization Track at TEC 2012 Europe did an audio podcast all 4 of us MVPs in the “Virtual Machine” expertise presenting in that track . You can find that podcast here on the vKernel/DELL web site http://www.vkernel.com/podreader/items/tec-europe-2012-mvps-with-mattias-sundling or on YouTube by clicking on the screenshot below. Enjoy.

image

Windows Server 2012 64TB Volumes And The New Check Disk Approach

Introduction

I a previous post I mentioned the use 64TB volumes in Windows Server 2012 in a supported scenario. That’s a lot of storage and data. There’s a cost side to this all and it also incurs some risk to have all that data on one volume. Windows 2012 tries to address the cost issue with commodity storage in combination with the excellent resilience of storage space to reduce both cost and risk. Apart from introducing ReFS they also did some work on NFTS to help with reliability. We already discussed the use of the flush command in Windows Server 2012 64TB NTFS Volumes and the Flush Command.  Now we’ll look at the new approach for detecting and repairing corruptions in NTFS which optimizes uptime through on line repair and keeps off line repairs minimized and very short thanks to spot fixing.

On top of these improvements studying this process taught me two very interesting things:

  1. The snapshot size limit is also a reason why NFTS volumes are not bigger than 64TB. See the explanation below!
  2. Cluster Shared Volumes an CSVFS enable continuous availability even when spot fix is run! See below for more details.

So read on and find out why I’m not worried about the 50TB & 37TB LUNs we use for Disk2Disk backups.

Hitting the practical boundaries of check disk (CHKDSK)

While NTFS has been able to handle volumes up to 256TB in size, this was never used in real life due t the fact that most people don’t have that amount of storage available (or need to have) and that the supported limited was 16TB. With Windows 2012 this has become 64TB. That’s just about future proof enough for the time being I’d say Winking smile. In real life the practical volume size has been smaller than this die to a number of reasons.

There is the limitation of basic disks which are solved with GPT, but that has it’s own requirements. The there are the storage arrays on which the biggest LUN you can create varies from 2TB tot 16, 50TB or more depending on the type, brand and model. Another big concern was based on potentially long CHKDSK execution time. No that the volumes size is the factor here, it’s the number of files on the volume that dictates how long CHKDSK will run. But volume size and number of files very often go hand I hand.

While Microsoft has been reducing the execution time of with every windows release since Windows 2000 the limit of additional improvements that could be made with the current approach have reach a practical limit. Faced with ever bigger volumes, a huge number of files and ever more “Always On” services, requiring very high to continuous availability, things needed to change drastically.

A vastly improved check disk (CHKDSK) for a new era

This change came through a new approach for detecting and repairing corruptions in NTFS that consists of:

  1. Enhanced detection and handling of corruptions in NTFS via on-line repair
  2. Change the CHKDSK execution model  to separate analysis and repair phases
  3. File system health monitored via Action Center and Server Manager

Enhanced NTFS Corruption Handling

NTFS now logs information on the nature of a detected corruption that cannot be repaired on line. This is maintained in new metadata files:

  • $Verify
  • $Corrupt

The new “Verification” component confirms the validity of a detected corruption to eliminated unnecessary CHKDSK runs due to a transient hiccup. There’s a service involved here call “Spot Verifier”:

image

The on-line repair capability that was introduced with the “Self-healing” feature in Vista and was limited to Master File Table (MFT) related corruptions has been greatly enhanced and extended. That means it can now handle a broader range of corruptions across multiple metadata files which means nearly all of the most common corruptions can be fixed by an on-line repair

The New CHKDSK Process & Phases

The phases are:

The analysis phase is performed online on a volume snapshot, so there is no down time for the services and users.

IMPORTANT NOTE: You read that right! The analysis phase is performed online on a volume snapshot. Now when you know that the maximum supported size of a Windows volume snapshot is 64TB you also know that except for stress & performance testing of 256TB LUNS there is another limitation in play. The size of the snapshot to make the new chkdsk process work! If you have volumes bigger than 64Tb, this process can and will use a hardware snapshot if there is a hardware VSS Provider that supports snapshots bigger than 64 TB. So the this new chkdsk process in Windows Server 2012 will also work for volumes bigger than 64TB. But within the Microsoft Windows Server 2012 stack, 64TB is the top limit or you lose this new chkdsk functionality. Interesting stuff!

If a corruption is detected, there will be a first attempt at Online Self-Healing via the self-healing API. Now if self-healing cannot repair the error the Online Verification “‘(Spot Verification” kicks in to verify that the error is not a glitch. When it is verified that any detected corruption that cannot be fixed on line is identified and logged to a new NTFS metadata file: $Corrupt. After this the, the administrators are notified so at a time of their choosing the volume can be taken offline to do the repairs

clip_image002

The Offline repair phase (spot fixing) only runs when all else has failed and can’t be avoided. The volume can be taken offline, either manually or scheduled, at a time the administrator’s chooses. Spot fix only repairs logged corruptions to minimize volume unavailability.

Cluster Shared Volumes bring us Continuous Availability in Windows Server 2012 as the process leverages clustering and CSVFS functionality in this case to make sure you don’t have to bring the volume down, IO is just temporarily stalled:

  • Scan runs & adds detected corruptions to $Corrupt
  • Every minute the cluser IsAlive  check runs on a cluster which also ….
  • Enumerates $corrupt system file to identify corrupt files via fsutil, if so action is taken
  • CSV namespace is paused to CSVFS & Underlying NTFS is dismounted
  • Spotfix runs for maximal 15 seconds, if needed this repeats every 3 minutes
  • It corruption repair will take too long it will be marked to run at off line moment and not do the above.

It normally takes no longer than a few seconds, often a lot less, to repair corruptions off line now, which is benign on a modern physical server that runs through its memory configuration, BIOS/EUFI, Controller. Even on laptops and virtual machines this is very short and doesn’t really add much to the boot time as you can see in the picture below, it’s often not even noticeable.

clip_image004

Using this new functionality

The user is notified via the Windows User Interface. The phases of repair are also displayed in the Action Center & Server Manager and the user can take appropriate action.

The chkdsk command line has had options added that leverage this model

clip_image006

The fsutil repair command has also some new options added:

clip_image007

You can also control the action via PowerShell with the storage cmdlet Repair-Volume. Acton can be run as a job and the parameters -scan, -spotfix, -offlinescanandfix are pretty obvious by now. See http://technet.microsoft.com/en-us/library/hh848662.aspx

Windows Server 2012 64TB NTFS Volumes and the Flush Command

As you might very well have read or even tried you can use 64TB volumes in Windows Server 2012 in a supported scenario. You can do more, NTFS is quite capable of this. I created a 300TB LUN once that I could format up to 256TB Smile But as no one can realistically stress test this for real, it’s not supported.

That’s a lot of storage and data. It’s also expensive and incurs some risk … all that data on one volume. Windows 2012 tries to address the cost issue with commodity storage in combination with the excellent resilience of storage space to reduce both cost and risk.

Apart from introducing ReFS they also did some work on NFTS to help with reliability:

  • A new approach for detecting and repairing corruptions in NTFS which optimizes uptime through on line repair and with spot fixing that keeps off line repairs minimized and very short.
  • Using the flush command instead of FUA.

In this post this we’ll focus on the flush command.

Flushing Your Data

No, not that kind of flushing Smile You have always been able to “throw” data away with some very bad practices and unreliable technology, no need for much innovation there.

I’m talking about the fact that NTFS in Windows Server 2012 has switched to the flush command instead of relying on Forced Unit Access (FUA) to increase reliability for SATA disk and performance with SCSI disks. The good news is you don’t lose anything and gain on both fronts. Especially making cheaper SATA disks more reliable is a big one. It allows SATA disks to be used in business/enterprise scenarios and as such helps reduce costs.

What is Forced Unit Access (FUA)?

Well it’s a flag that indicates a given write should go directly to media, writing through a devices write cache. The NTFS Journaling File System uses FUA to guarantee write ordering which is important to maintain its metadata integrity. It was  implemented in the SCSI (T10) specification but not in the original  ATA (T13) specification. This was added in the 2002 version of the ATA specs but FUA has never been guaranteed to implemented on all ATA devices and as such Windows could not rely on it being there with ATA/SATA disks. As a result it was never used by Windows with SATA disks.

That meant that with SATA disks there is a bigger change of corruption due to a power failure or the likes as NTFS was designed to rely on FUA implementation for robust metadata writes.With ever increasing capacity needs an larger SATA disk being needed and used for business purposes something had to be done. So with Windows Server 2012 (and Windows 8) NTFS switched to using a  flush command to the drives write cache instead of using FUA.

The Benefits

  1. The switch to using the flush command for all operations that require write ordering to ensure file system metadata integrity realizes better reliability and robustness when using commodity SATA storage as it reduces possibility of corruption due to power loss
  2. It Improves performance on SCSI devices because it allows the disk to cache data for as long as safely possible instead of having to do write-through using FUA

Prepare For Windows Server 2012 This Fall

With the academic year in full swing and no lack of material we need to learn and study Microsoft is giving you a helping hand by providing a huge pool of resources. There’s TechNet, Evaluation software but also the Microsoft Virtual Academy.

To reward you for your hard work you get a cinema ticket to go relax at the movies. Click on the banner for more information on this.

clip_image001

The nice thing about the Microsoft Virtual Academy that you can do study the modules at your own pace and at your own moment of choosing. Combine this with TechNet Virtual labs and you have a lot of resources at your finger tips to learn even more even if you don’t have the room or resources to build a (home) lab.

If all that technology left you wanting to get some hands on with Windows Server 2012 you can also register for an IT Camp. There should be some more of them coming I’ve read on the site. With Mike Reseler  (@MikeResseler) & Arlindo Alves@aralves) delivering those IT Camps you’ll be in good hands. Happy studying and have fun at the movies Winking smile