Windows Data Deduplication and Cluster Operating System Rolling Upgrades

Introduction

Have you considered when Windows data deduplication and cluster operating system rolling upgrades from Windows Server 2012 R2 to Windows Server 2016 Clusters are discussed we often hear people talk about Hyper-V or Scale Out File Server clusters, sometimes SQL but not very often for a General-Purpose File Share server with continuous availability. Which is the kind I’ve done quite a number of actually.

clip_image002

Being active in an industry that produces and consumes file data in large quantities and sizes we have been early implementers of Windows cluster for General-Purpose File Shares with continuous availability. This provides us the benefits of SMB 3, ODX for both the clients and IT Operations for workloads that are not suited for Scale Out File Server deployments.

As such we have dealt with a number of Windows 2012 R2 GPFS with GA cluster that we wanted to move to Windows Serer 2016. Partially to keep the environment up to date and partially because we want to leverage the new Windows deduplication capabilities that this OS version offers. The SOFS and Hyper clusters that I upgraded didn’t have data deduplication enabled.

The process to perform this upgrade is straight forward and has been documented well by others as well as by me in regards to issues we saw in the field . We even dove behind the scenes a bit in Cluster Operating System Rolling Upgrade Leaves Traces. I have also presented on this topic in public at conferences around Europe (Ireland, Germany and Belgium) as part of our community contributions. No surprises there.

Test your assumptions

This is a scenario you can perform without any downtime for your clients when all things go well. And normally it should. I have upgraded a couple of Scale Out File Server (SOFS) and General-Purpose File Server (GPFS) cluster with Continuous availability now and those went very well. Just make sure your cluster is perfectly healthy at the start.

Naturally there are some check you need to make that are outside of Microsoft scope:

I’m pretty sure you have good backups for your file data and you should check this works with Windows Server 2016 and how it reacts during the upgrade while the server is in mixed mode. Perhaps you will or won’t be able to run backups or restore data. Check and know this.

Verify your storage solution supports and words with Windows Server 2016. It sounds obvious but I have seen people forget such details.

Another point of attention is any Anti-Virus you might have running on the file server cluster nodes. Verify that this is fully supported on Windows Server 2016. On top of that validate that the Anti-Virus still works well with ODX so you don’t run into surprises there. Don’t assume anything.

Check if the server and it components (HBA, NICs, BIOS, …) its firmware and drivers support Windows Server 2016. Sure, the rolling upgrade allows for some testing before committing but that doesn’t mean you should go ahead blindly into the unknown.

Make sure your nodes are fully patched before and after the upgrade of a cluster node.

As the file server cluster is already leveraging SMB 3 with continuous availability al the prerequisites to make that work are already take care of. If you are upgrading a File server cluster without continuous availability and are planning to start using this, that’s another matter and you’ll need to address any issues. You can do this before or after moving to Windows Server 2016. This means you’d move to a solution before you upgrade or after you have performed the upgrade to Windows Server 2016.

You can take a look at my blogs on this subject from the Windows 2012 R2 time frame such as More Tips On Dealing With Removing Short File Names When Migrating To a SMB3 Transparent Failover File Server Cluster, Migrate an old file server to a transparent failover file server with continuous availability and SMB 3, ODX, Windows Server 2012 R2 & Windows 8.1 perform magic in file sharing for both corporate & branch offices

Data deduplication takes some extra consideration

I have blogged before on how Windows Server 2016 Data Deduplication performs and scales better than it did it Windows Server 2012 R2. This also means that it works at least partially different than it did Windows Server 2012 R2. You can see this in some of the updates that came out in regards to a data corruption bug with data deduplication which only affected Windows Server 2016.

clip_image003

Given this difference, what would happen if you fail over a LUN with deduplication enable from Windows Server 2012 to Windows Server 2016 and vice versa? That’s the question I had to consider when combining Windows data deduplication and cluster operating system rolling upgrades for the first time.

Windows Server 2016 is backward compatible and will work just fine with a LUN that from and Windows Server 2012 Server that has Windows data deduplication enabled. The reverse is not the case. Windows Server 2012 R2 is not forward compatible. When dealing with data deduplication in an Operating System Rolling Upgrade scenario I’m extra careful as I cannot guarantee any LUN movement scenario will go well. With a standalone server

Once I have failed over a LUN to Windows Server 2016 node in a mixed cluster I avoid moving it back to a Windows Server 2012 R2 node in that cluster. I only move them between Windows Server 2016 nodes when needed.

I move through the rolling upgrade as fast as I can to minimize the time frame in which a LUN with data deduplication could end up moving from a Windows Server 2016 o a Windows Server 2012 cluster node.

Should I need to reverse the Operating System Rolling Upgrade to end up with a Windows Server 2012 R2 cluster again I’ll make absolutely sure I can restore the data from LUNs with data deduplication from backup and/or a snapshot from a SAN or such. You cannot guarantee that this will work out fine. So be prepared.

For “standard” non deduplicated NTFS LUNs you can fail back if needed. When data deduplication is enabled you should try to avoid that and be prepared to restore data if needed.

Final advise is always the same

Even when you have tested your upgrade scenario and made sure your assumptions are correct you must have a way out. And as always, “One is none, two is one”.

As always during such endeavors you need to make sure that you have a roll back scenario in things do not work out. You must also have a fail back plan for when things turn really bad. For most scenarios has the ability to return to the original situation built in. But things can go wrong badly and Murphy’s Law does apply. So also have the backups and restore verified just in case.

The last thing you need after a failed upgrade is telling your customer or employer “it almost worked” but unfortunately, they’ve lost that 200TB of continuous available data. Better next time doesn’t really cut it.

In place upgrade of RD Gateway farm nodes to Windows Server 2016 removes the Loopback adapter for UDP load balancing

Here’s a quick heads up to anyone who’s involved in upgrading existing Windows Server 2012 (R2) RD Gateway farms to Windows Server 2016.

In my recent experiences the in place upgrade (VMs) works rather well. Just make sure the netlogon service is set to automatic (a know issue and a fix is coming) after you upgrade and install all updates. Also make sure that you don’t have this issue

Windows Time Service settings are not preserved during an in-place upgrade to Windows Server 2016 or Windows 10 Version 1607

There is however one networks specific issue specific you’ll need to deal with when leveraging UDP with a load balancer via Direct Server Return.

When you have a RD Gateway farm you load balance it with a (preferably high available) load balancer like a Kemp Loadmaster. I have described this in these blogs/videos Load balancing Hyper-V Workloads With High To Continuous Availability With a KEMP Loadmaster and Quick Demo Video Of Site Failover With KEMP Loadmaster Global Balancing

What you also do is load balance both HTTPS (TCP, port 443) and UDP (port 3391). For UDP we use Direct Server Return ((DSR) as described in my blog post Load balancing UDP for a RD Gateway farm with a KEMP Loadmaster. This requires a properly configured loopback adapter.

image

During the in place upgrade to Windows Server 2016 this loopback adapter is removed form the nodes. So you need to add it back just a described in my original blog post. Normally it will find the settings for it in the registry but it’s bets you check it all out as I’ve found that the loopback adapter did have “Register this connection”s address in DNS” enabled as well as NETBIOS over TCP/IP. So, per my blog post, check it all to make sure. Other than that, after installing all the Windows Server 2016 updates all works smoothly after an in place upgrade.

Hope this helps someone out there!

In Place upgrades of cluster nodes to Windows Server 2016

You will all have heard about rolling cluster upgrades from Windows Server 202 R2 to Windows Server 2016 by now. The best and recommend practice is to do a clean install of any node you want to move to Windows Server 2016. However an in place upgrade does work. Actually it works better then ever before. I’m not recommending this for production but I did do a bunch just to see how the experience was and if that experience was consistent. I was actually pleasantly surprised and it saved me some time in the lab.

Today, if you want to you can upgrade your Windows Server 2012 R2 hosts in the cluster to Windows Server 2016.

The main things to watch out for are that all the VMs on that host have to be migrated to another node or be shut down.

You can not have teamed NICs on the host. Most often these will be used for a vSwitch, so it’s smart and prudent to note down the vSwitch (or vSwitches) name and remove them before removing the NIC team. After you’ve upgraded the node you can recreate the NIC team and the vSwitch(es).

Note that you don’t even have to evict the node from the cluster anymore to perform the upgrade.

image

I have successfully upgrade 4 cluster this way. One was based on PC hardware but the other ones where:

  • DELL R610 2 node cluster with shared SAS storage (MD3200).
  • Dell R720 2 node cluster with Compellent SAN (and ancient 4Gbps Emulex and QLogic FC HBAs)
  • Dell R730 3 node cluster with Compellent SAN (8Gbps Emulex HBAs)

Naturally all these servers were rocking the most current firmware and drives as possible. After the upgrades I upgraded the NIC drivers (Mellanox, Intel) and the FC drivers ‘(Emulex) to be at their supported vendors drivers. I also made sure they got all the available updates before moving on with these lab clusters.

Issues I noticed:

  • The most common issue I saw was that the Hyper-V manager GUI was not functional and I could not connect to the host. The fix was easy: uninstall Hyper-V and re-install it. This requires a few reboots. Other than that it went incredibly well.
  • Another issue I’ve seen with upgrade was that the netlogon service was set to manual which caused various issues with authentication but which is easily fixed. This has also been reported here. Microsoft is aware of this bug and a fixed is being worked on.

 

.

First experiences with a rolling cluster upgrade of a lab Hyper-V Cluster (Technical Preview)

Introduction

In vNext we have gotten a long awaited  & very welcome new capability: rolling cluster upgrades. Which for the Hyper-V roles is a 100% zero down time experience. The only step that will require some down time is the upgrade of the virtual machine configuration files to vNext (version 5 to 6) as the VM has to be shut down for this.

How to

The process for a rolling upgrade is so straight forward I’ll just give you a quick bullet list of the first part of the process:

  • Evacuate the workload from the cluster node you’re going to upgrade
  • Evict the node to upgrade to vNext from the cluster
  • Upgrade (no in place upgrade supported but in your lab you can get away with it)
  • Add the upgraded node to the cluster
  • Rinse & repeat until all nodes have been upgraded (that can take a while with larger clusters)

Please note that all actions you administration you do on a cluster in mixed mode should be done from a node running vNext or a system running Windows 10 with the vNext RSAT installed.

Once you’ve upgraded all nodes in the cluster, the situation you’re in now is basically that you’re running a Windows Server vNext Hyper-V cluster in cluster functional level 8 (W2K12R2) and the next step is to upgrade to 9, which is vNext, no there no 10 yet in server Winking smile

You do this by executing the Update-ClusterFunctionalLevel cmdlet. This is an online process.  Again, do this from a node running vNext or a system running Windows 10 with the vNext RSAT installed. Note that this is where you’re willing to commit to the vNext level for the cluster. That’s where you want to go but you get to decide when. When you’ve do this you can’t go back to W2K12R2. It’s a matter of fact that as long as you’re running cluster functional level 8, you can reverse the entire process. Talk about having options! I like having options, just ask Carsten Rachfahl (@hypervserver), he’ll tell you it’s one of my mantras.

image

When this goes well you can just easily check the cluster functional level as follows:

image

When this is done you can do the upgrade of the VM configuration by running the Update-VMConfigurationVersion cmdlet. This is an off line process where the VMs you’re updating have to be shut down. You can do this for just one VM, all or anything in between. This is when you decided you’re committing to all the goodness vNext brings you.  But the fact that you have some time before you need to do it means you can  easily get those machine to run smoothly on a W2K12R2 cluster in case you need to roll back. Remember, options are good!

Doing so updates VM version from 5 to 6 and enables new Hyper-V features (hit F5 a lot or reopen Hyper-V Manager to see the value change.

image

image

Note: If in the lab you’re running some VMs on a cluster node are not highly available (i.e. they’re not clustered) they cannot be updated until the cluster functional level has been upgraded to version 9.