Shared Nothing Live Migration Leverages SMB 3.0 Under the Hood

Shared Nothing Live Migration

By now most of you must have heard about the Shared Nothing Live Migration capabilities introduced with Windows Server 2012 Hyper-V. If not I suggest you check it out over here and then come back here for some extra insights in how it works.

Shared Nothing Live Migration is not magic however. It is made possible by the fact that it relies on some of the new capabilities SMB 3.0 in Windows Server 2012 brought us. Once you know this you also realize that this can be quite fast. The reason for this is that you can design your the network for Shared Nothing Live Migration with 10Gbps or higher, Multi Channel and RDMA for unprecedented throughput. Yup Smile, if you invest in setting up networking right the remaining bottle neck might be the amount of storage IO you can handle whilst reading from the source and writing to the target, or the CPU load you put o your host. Windows will protect you from draining your host beyond reason by the way.

Making Shared Nothing Live Migration Work

You need to set if up of course and do it right. Here’s a list of steps you need to do / check on every Hyper-V host involved.

  1. Enable incoming and outgoing live migrations on all involved Hyper-V host otherwise it will not work. If your host are part of  a cluster this is taken care of for you.
  2. Select an authentication protocol (CredSSP or Kerberos)
    Kerberos authentication allows you to Live Migrate VMs without having to login to the source host’s server itself. Kerberos authentication does require you to configure constrained delegation in Active Directory (don’t go for "Trust this computer for delegation to any services". Follow the principle of least privileges possible.
  3. Set the number of Simultaneous Live Migrations. Experiment with the best value for you environment. Test a little what’s
  4. Set the networks(s) for incoming Live Migrations. It’s best to design this and not just use any network.

See Keith Mayer’s excellent blog for more details.

Constraint Delegation

Shared Nothing Live Migration needs some prep work security wise before it will work. In Active directory you need to set up so constraint delegation permissions. To some people the concept of constraint delegation is brand new but if you’ve been deploying multi tiered web applications in your environment before this is a cookie you’ve dealt with many times before. It’s the same approach you need to get a web client using Windows Authentication to talk via an IIS web app or service to a SQL Server database and/or read file data from somewhere you’ve been configured this plenty of times.

Use an account to perform the Shared Nothing Live Migration that has administrator privileges on all computers that are involved. While you can use groups in AD to make your live and permission management easier when it comes to granting Share permissions & NTFS rights on folders it doesn’t work that way with constraint delegation. Groups can not be used here so you’ll need to use individual accounts. PowerShell scripting here can help lessen the work if you have many hyper-v hosts involved. In large environments (up to 64 nodes!) this inundates the constraint delegations tab with computer names, so PowerShell really is your friend here.

On each computer object you need to set the delegation permissions for the  CIFS and the Microsoft Virtual System Migration Service to all other computers you want to involve in Shared Nothing Live Migration as a source or a target.

IMPORTANT! Hey why do we need CIFS constraint delegation here? Well indeed because Shared Nothing Live Migration under the hood leverages SMB 3.0. It creates a temporary file share on the target to get the job done Smile! So once you realize that Shared Nothing Live Migration uses SMB 3.0 shares to do it’s magic it than becomes obvious why these constraint delegation permissions for CIFS in active directory are needed.

Visualizing the SMB 3.0 share in action

At the source server (ZULU) we run  after starting the Shared Nothing Live Migration and see that we have a connection to a share o the target server. That share is named after the source server with an ID like ZULU.3341302342$. So it’s a hidden share.image

 

On the target server we run Get-SmbSession | fl and see that indeed the source computer has two sessions open on target server.

image

 

Let’s see if a share is created using Get-SmbShare.on the target. Yes there is:

image

 

In Computer Management it shows up like this on the target sever:

image

In explorer you can see this as a $VSM$ folder in the root of C, that has a subfolder with the name of the source server and an ID like ZULU.2541288334$. This subfolder is shared (hidden) and contains a shortcut to the volume where the selected target folder resides, this could be C, D local storage (DAS), shared storage (CSV) or an SMB 3.0 share as well. In the screen shot below the folder doesn’t match up to the share name as they are taking from different Shared Nothing Live Migration

image

Security wise we’re to keep our hands of and the security settings reflect this Winking smile. But if you take ownership you can co peak at what’s in there. When writing a blog post for example WhistlingWe indeed saw the copied disk size of the VM being live migrated increase in the selected target folder.

image

image

Conclusion

I find it pretty cool to see how this all works under the hood. Hope you found this educational and interesting as well. It’s a testimonial to what SMB 3.0 can be leveraged for all kind of interesting scenarios.

Windows Server Backup Benefits from Improvements in Windows Server 2012

Introduction

In certain environments we backup VMs and any remaining physical hosts using Windows Backup. Before you all think this is ridiculous, I advise you to think again. With some automation you can build a very reliable agentless backup solution with the built in functionality. Windows Server 2012 brings good news for the smaller & perhaps low budget environments. Windows Server Backup is now capable of doing host level backups of the Hyper-V guest stored on Clustered Shared Volumes. This was not the case in Windows 2008 R2 and it is a vast improvement.This change is due to the fact that CSV has been changed not to require specialized API capable of dealing with it’s intricacies. All backup products now can backup CSVs without specialized APIs.

This is, linked to huge improvements in how a CSV behaves during a backup. In the past, when you started a backup, the CSV ownership would be moved to the node that runs the backup and all access by other nodes was in redirected mode for the duration of the backup. Unless you used a hardware VSS provider, which were not trouble free either. If your backup software did not understand CSVs and use the CSV APIs you were out of luck. From Windows Server 2012 on you are only in redirected I/O mode for the time it takes to create the VSS snapshot. The rest of the backup duration your nodes access the CDV disk in direct mode. So back to Windows Server Backup. You cannot up the CSV as disk volume but you can select Hyper-V from the items to include in the backup.

image

That will show all VMs running on that host, meaning you cannot backup VMs running on another host. Compared to Windows Server 2008 R2 where using the native Windows backups  with VMs on a CSV LUN meant using in guest backups this a major improvement.

Some Approaches to Using Windows Server Backup

Sometime we run the backups to local disk and regularly copy those off to a file share. This has the benefit of providing the backup versioning you can from using a local disk. The draw back is that the backups can be rather big.

In VMs, that is with backups in the guest, we run those backups to a file share over a 1Gbps management network. Performance is good, but it leaves us with the issue that there is no versioning.

For that reason our backup script copies the entire backup folder for a server to an archive folder on that same JBOD. Depending on how much space you have and need you can can configure the retention time of these older backups. This way you can keep a large number of backups over time.  A script runs every day that deletes the older backups based on the chosen retention time so you don’t run out of space.

There is one way around the lack of versioning when writing backups to a share and that is to mount a VHD on a file share locally to the host where you are running a backup or use pass through disk inside the VM. While you can get away with this becomes rather messy due to management & flexibility drawbacks of pass through disks. Mounting a VHD on a file share inside a VM is also a performance issue. So while possible and viable in certain scenarios I don’t use this for more than a few hosts and those are physical ones.

We had hoped that Windows Server 2012 with its support for VSS snapshots on SMB 3.0 shares would have enabled backup versioning in Windows Backup, just like it can do for backups to local disk. Unfortunately this is not the case. You’ll still get the same warning when backing up from a Windows Server 2012 host to an SMB 3.0 share as you used to get with previous versions:

image

How fast are backups and restores?

The largest environment is a couple of Physical servers, a two node Hyper-V cluster and about 22 virtual machines. That includes some with a larger amount of data. The biggest being 400 GB. Full backups are run weekly at night on all servers over 1Gbps and this works just fine.

We can backup a VM of with a 50GB VHD (about 50% to 75% in use) and copy that backup to the archive folder in 20 minutes. We backup AND copy to archive a VM’s C: Drive (20GB of data) and a D: Drive (190GB of data), 2 separate VHD  in 2.5 hours.

For some statistics. A bare metal restore of a VM or physical host over 1Gbps with a single VHD or volume with the OS, applications and some data takes us 30 tot 35 minutes in real life due to the overhead of setting a new VM up. I you just want to restore individual data you can do that as well. You can even mount the backup VHD and recover them via the Windows Explorer.

This is what a the logging we do and e-mail to the sys admins looks like:

image

Where else do the Windows Server 2012 improvements help?

Data Deduplication

Now there is some good news. We ran ddpeval.exe against the JBOD LUNS where we store these archived backups and got some great results. We also copied such an archive folder to a Windows Server 2012 Host and ran data deduplication against it. In that test we achieved a 84%-85% deduplication rate depending on how many versions of the backups we archive and what the delta is during that time frame.  The latter is important. If we run dedupe only against domain controller backup archives we get up to 94%. Deduplication should not not impact restore performance to much because in 99% of the cases you revert to the last backup which sits in the WindowsBackup Folder. Only if you need older backups you will work against the deduped files, unless the archive folder is on the same LUN as the original backups. I’ don’t have real life info about restores yet. Just a small lab test.

SMB 3.0 & Multichannel

In Windows Server 2012 you also get all benefits of SMB 3.0 and MultiChannel for your backup traffic.

Not your grandfathers ChkDsk

The vastly improved ChkDsk is a comfort for worried minds when it comes to fixing potential corruption on a large LUN. Last but not least, the FLUSH command in NTFS makes using cheaper data disks safer.

VHDX

The VHDX format allows for 64TB. That means your Windows Server backup can now handle more than 2TB LUNs. This should be adequate Smile

Conclusion

With some creativity and automation via scripting you can leverage the Windows Backup to be a nice and flexible solution. Although I feel that providing backup versioning to a file share is an improvement that is missing  the new features help out a lot and all in all, it’s not bad at all! So you see, even smaller organizations can benefits seriously from Windows Server 2012 and get more bang for their bucks.