Accelerated Checkpoint merging with ReFS v2 in Windows Server 2016

Introduction

This blog post is a teaser where we show you some of the results we have seen with ReFS v2 in Windows 2016 (TPv4). In a previous blog post (Lightning Fast Fixed VHDX File Creation Speed With ReFS on Windows Server 2016) we have demonstrated the very fast VHDX file creation capabilities we got with ReFS v2. Now we look at another benefit of ReFS v2 in a Hyper-V environment, thanks to a feature or ReFS v2 called block cloning. We get accelerated checkpoint merging with ReFs v2 in Windows 2016

The Demo

For this short demo we have a virtual machine running Windows Server 2016. It resides on a CSV formatted with REFS (64K unit allocation size). Inside the virtual machine there is a second data disk. Our  VM called CheckPointReFS (64K unit allocation size) has this data volume formatted with ReFS (64K unit allocation size) and it runs on the ReFS formatted CSV. The disks in this test are fixed sized VHDX files.

On the data volumes we have about 30GB worth of ISO files. We checkpoint the VMs and then create a copy of those files on the data volume.

image

We then delete this checkpoint.

image

Via the events 19070 (start of a background disk merge) and 19080 (completion of a background disk merge) in the Microsoft-Windows-Hyper-V-VMMS/Admin logs we calculate the time this took: 5 seconds.

image_thumb76

image_thumb74

There are moments you just have to say “WAUW”. Really this rocks and it’s amazing. So amazing I figured I made a mistake and I ran it again … 4 seconds. WOEHOE!  What where the times you saw when you last deleted a large checkpoint?

I am really looking forward to do more testing with ReFS v2 capabilities with Hyper-V on Windows 2016.

Windows Server 2016 TPv4 Hyper-V brings virtual machine configuration version 7

When building a Windows Server  2016 TPv4 Hyper-V cluster this weekend I noticed that we now have a new version of the virtual machine configuration.

When we migrate (rolling cluster upgrade, move to new cluster or host, import on new cluster or host) virtual machines to  Windows Server 2016 Hyper-V from Windows Server 2012 R2, the virtual machine’s configuration file isn’t automatically upgraded. In the past it was, which blocked moving back to a previous edition of Hyper-V. Now we can do this until we manually update the virtual machine configuration version.  This block going back but it enables our new virtual machine features. Version 5.0 is the one that’s compatible with Windows Server 2012 (R2) Windows Server 2016. Version 6.2 was what we had in TPv3 and could only run on Windows Server 2016. Windows Server 2016 TPv4 Hyper-V brings virtual machine configuration version 7.

When you have virtual machines that come from  Technical Preview v3 and you had updated the virtual machine configuration of your virtual machines or created brand new ones these would be at version 6.2. Since I do not consider it wise to keep testing these on a version of a previous preview I updated them all to version 7.

image

The code below grabs all VMs on all cluster nodes (even the none clustered VMs), shuts them down, updates the configuration version and starts them again. It’s just a quick example.

image

Now do NOT do this to virtual machines with configuration version 5 that you might want to move back / import to a Windows Server 2012 R2 Hyper-V host. But if you know you’ll be testing with the new features, have a blast, like me here on the TPv4 lab cluster.

image

I’m still looking for the features version 7.0 enables, probably nested virtualization is one of those features I’m guessing. Happy testing!

Musings On Switch Embedded Teaming, SMB Direct and QoS in Windows Server 2016 Hyper-V

When you have been reading up on what’s new in Windows Server 2016 Hyper-V networking you probably read about Switch Embedded Teaming (SET). Basically this takes the concept of teaming and has this done by the vSwitch. Which means you don’t have to team at the host level. The big benefit that this opens up is the RDMA can be leveraged on vNICs. With host based teaming the RDMA capabilities of your NICs are no longer exposed, i.e. you can’t leverage RDMA. Now this has become possible and that’s pretty big.

clip_image001

With the rise of 10, 25, 40, 50 and 100 Gbps NICs and switches the lure to go fully converged becomes even louder. Given the fact that we now don’t lose RDMA capabilities to the vNICs exposed to the host that call sounds only louder to many.  But wait, there’s even more to lure us to a fully converged solution, the fact that we now do no longer lose RSS on those vNICs! All good news.

I have written an entire whitepaper on convergence and it benefits, drawback, risks & rewards. I will not repeat all that here. One point I need to make that lossless traffic and QoS are paramount to the success of fully converged networking. After all we don’t want lossy storage traffic and we need to assure adequate bandwidth for all our types of traffic. For now, in Technical Preview 3 we have support for Software Defined Networking (SDN) QoS.

What does that mean in regards to what we already use today? There is no support for native QoS  and vSwitch QoS in Windows Server 2016 TPv3. There is however the  mention of DCB (PFC/ETS ), which is hardware QoS in the TechNet docs on Remote Direct Memory Access (RDMA) and Switch Embedded Teaming (SET). Cool!

But wait a minute. When we look at all kinds of traffic in a converged Hyper-V environment we see CSV (storage traffic), live migration (all variations), backups over SMB3 all potentially leveraging SMB Direct. Due to the features and capabilities in SMB3 I like that. Don’t get me wrong about that. But it also worries me a bit when it comes to handling QoS on the hardware side of things.

In DCB Priority Flow Control (PFC) is the lossless part, Enhanced Transmission Selection (ETS) is the minimum bandwidth QoS part. But how do we leverage ETS when all types of traffic use SMB Direct. On the host it all gets tagged with the same priority. ETS works by tagging different priorities to different workloads and assuring minimal bandwidths out of a total of 100% without reserving it for a workload if it doesn’t need it. Here’s a blog post on ETS with a demo video DCB ETS Demo with SMB Direct over RoCE (RDMA .

Does this mean a SDN QoS only approach to deal with the various type of SMB Direct traffic or do they have some aces up their sleeves?

This isn’t a new “concern” I have but with SET and the sustained push for convergence it does has the potential to become an issue. We already have the SMB bandwidth limitation feature for live migration. That what is used to prevent LM starving CSV traffic when needed. See Preventing Live Migration Over SMB Starving CSV Traffic in Windows Server 2012 R2 with Set-SmbBandwidthLimit.

Now in real life I have rarely, if ever, seen a hard need for this. But it’s there to make sure you have something when needed. It hasn’t caused me issues yet, but I’m a performance & scale first, in “a non-economies of scale” world compared to hosters. As such convergence is a tool I use with moderation. My testing when traffic competes without ETS is that they all get part of the cake but not super predictable/ consistent. SMB bandwidth limitation is a bit of a “bolted on” solution => you can see the perf counters push down the bandwidth in an epic struggle to contain it, but as said it’s a struggle, not a nice flat line.

Also Set-SmbBandwidthLimit is not a percentage, but hard max bandwidth limit, so when you lose a SET member the math is off and you could be in trouble fast. Perhaps it’s these categories that could or will be used but it doesn’t seem like the most elegant solution/approach. That with ever more traffic leveraging SMB Direct make me ever more curious. Some switches offer up to 4 lossless queues now so perhaps that’s the way to go leveraging more priorities … Interesting stuff! My preferred and easiest QoS tool, get even bigger pipes, is an approach convergence and evolution of network needs keeps pushing over. Anyway, I’ll be very interested to see how this is dealt with. For now I’ll conclude my musings On Switch Embedded Teaming, SMB Direct and QoS in Windows Server 2016 Hyper-V

Windows Server 2016 Data Deduplication Scales and Performs Better

I’ve been leveraging Windows Server Data Deduplication since it became available with great results.

Embedded image permalink

One of the enhanced features in Windows Server 2016 is Data Deduplication and it’s one I welcome very much. The improvements we’re getting mostly have to do with scale and performance. I’m quite pleased that Microsoft listened to our previous feedback on this.

image

You cannot imagine how much money on backup target storage we have saved by using this. So we’re very happy that Windows Server 2016 Data Deduplication scales and performs better. The fact that we can no get even better scale and performance is music to our ears. The Backup target servers are the first in line for an upgrade, that’s for sure! That’s the reason I mentioned it as a subject to look into in the Hyper-V amigos interview at Ignite!

Scale Improvement of the supported LUN sizes, up to 64TB

Actually I was already pushing this to 50TB Embarrassed smile in some cases for testing but over all I used 6 to 10 TB volumes. But the support for bigger volumes is very welcome. Now, please not that you should NOT go any higher than 64TB (I actually stay below that) otherwise deduplication doesn’t work due to it’s dependency on VSS. Please read my blog

Windows 2012 R2 Data Deduplication Leverages Shadow Copies: “LastOptimizationResultMessage : A volume shadow copy could not be created or was unexpectedly deleted” on this subject.

In Windows 2012 R2 we were limited because data deduplication used a single-threaded job and I/O queue for each volume. That makes it wiser to have 10 target LUNS of 6TB than one huge 60TB LUN. The big issue otherwise is that large volumes could lead to the dedup processing keeping up with the rate of data changes (“churn”).  Now your milage would very depending on the type of data and the delta. More info on this in the blog post:Sizing Volumes for Data Deduplication in Windows Server. It will help you size the volumes but note that in Windows Server 2016 the rules have changed Smile

The dedup optimization processing now runs multiple threads in parallel using multiple I/O queues on a single volume which gives you better performance and doesn’t incur the overhead of having to use more smaller LUNs.

File sizes up to 1TB are good for dedup

Windows Server 2012 R2 Data Deduplication supports the use of file sizes up to 1TB, but they are considered as “not good candidates” for dedup.  So that DPM workaround of backing up to a truckload of virtual machines with 1TB virtual disks that are deduplicated is borderline. You can see one improvement in CPS v2 coming already (also see the next header). 1TB is now fully supported and a good candidate. I’ll be pushing it higher … in my opinion this is were the most work will need to be done for future improvements. It would allow for more scenarios (I have VMs that hold VHDX virtual disks of  2TB or more). Scale it something that helps keep things simple. Simple avoid costs & issue with complexity. That’s always a good thing if possible.

In Windows Server 2012 the algorithms can’t scale as well and performance suffers due to things like scanning for and inserting changes can slow down as the total data set increases. These processes have been redesigned in Windows Server 2016. It now uses new stream map structures and improved partial file optimization. As a result 1TB file sizes have become good candidates.

Virtualized backup is a new usage type

DPM is already leveraging deduplication of virtual machines (CPS drove that I think, see Deduplicating DPM Storage).

image

In Windows Server 2016 all the dedup configuration settings needed for the DPM backup scenario have been combined into a new usage type called “Backup”. This simplifies the deployment and helps “future proof” your setup as future changes can automatically be applied true this usage type.

Nano Server support

Data deduplication is (or will be) fully supported in Nano Server (new in TPv3). It’s not completely done yet so deduplication support in Nano Server still has a few restrictions:

  • Support has only been validated in non-clustered configurations
  • Deduplication job cancellation must be done manually (using the Stop-DedupJob PowerShell command)

Microsoft welcomes any feedback on the deduplication feature via an email sent to [email protected]. For me the standing order is to break through that 1TB barrier!

My take & Magic Ball

In combination with the right backup product it saves a ton of money. I have leveraged VEEAM and in the past Windows Backup (inbox) with great results. The benefit of these two is that you can backup to physical storage and leverage deduplication. Virtualized backup as a new usage type and makes live easier for the supported “workaround” around the limitations of DPM where normally they only support VDI for  with deduplication.  What I’m really curious about is another possible future usage type: “Virtual Servers” … I guess for that one deduplication support for the OS disk would be very beneficial for “cloud” providers. We’ll see