Cluster Operating System Rolling Upgrade Leaves Traces

Introduction

When you perform a cluster OS rolling upgrade of Windows Server 2012 R2 cluster to a Windows Server 2016 Cluster you’ll have two options.

1. You evict the nodes, one after the other, perform a clean OS install and join them to the existing cluster.

2. You do an in-place OS upgrade of the nodes (no need to evict the nodes, you can if you want to). I tested this and blogged about it in In Place upgrades of cluster nodes to Windows Server 2016  

Both of these give you the benefits that you can keep your workloads (Hyper-V, SOFS, SQL Server) running and you don’t have to create a new cluster to do so. The moment you have Windows Server 2016 Nodes added to an existing Windows Server 2012 R2 cluster you are running in Mixed mode. Until all your nodes have been upgraded to Windows Server 2016 will remain running in mixed mode.

Illustration showing the three stages of a cluster OS rolling upgrade: all nodes Windows Server 2012 R2, mixed-OS mode, and all nodes Windows Server 2016

When there are only Windows Server 2016 nodes you can decide to also upgrade the cluster functional level.  This enables all the new capabilities in Windows Server 2016 Failover Clustering and also means you cannot go back to a Windows 2012 R2 cluster anymore. So, only take this step after a final validation of all drivers and firmware to make sure you don’t need to go back and you’re ready to fully commit to a fully functional Windows Server 2016 Failover Cluster.

A cluster operating system rolling upgrade does leave some traces, but that’s OK. Let’s take a look. 

This is what a get-cluster against Windows Server 2016 that was upgraded from Windows Server 2012 R2 looks like.

image

As you can see the cluster functional level is 8 and not 9 yet. This means that we have not yet run the Update-ClusterFunctionalLevel command on this cluster yet. Which still allows us to roll back all the way to a cluster running only Windows 2012 R2 nodes. The ClusterUpgradeVersion has a value of 3.

We now execute the Update-ClusterFunctionalLevel command and take a look at Get-Cluster again.

image

As you can see we are now at cluster functional level 9 which enables all the capabilities offered by Windows Server 2016 Failover Clustering. The cluster Upgrade version is 8. That’s the previous cluster functional level we were at before we executed Update-ClusterFunctionalLevel.

Note that both properties ClusterFunctionalLevel and ClusterUpgradeVersion are only available with Windows Server 2016. You will not find it on a Windows Server 2012 R2 or lower cluster. If you run this command from Windows Server 2016 against a Windows Server 2012 R2 cluster both properties will be empty. If you run it on a Windows Server 2012 R2 host against Windows Server 2012 R2 or lower and even a Windows Server 2016 cluster these properties are not even there. The commandlet is older on those OS versions and didn’t know about these properties yet.

What about if you create a brand-new cluster, perhaps even on freshly installed windows Server 2016 Nodes? What does ClusterUpgradeVersion have as a value then? Well it’s also 8. In the end, there is no difference between an in-place upgrade Windows Server 2016 cluster and a cleanly created one. So where are those traces?

Cluster Operating System Rolling Upgrade Leaves Traces

What gives a rolling upgrade away is that in the registry, under the HKLM\Cluster the OS and OSVersion values are not updated (purple in the picture below). This is a benign artifact and I don’t know if this if on purpose or not.  I have changed them to Windows Server 2016 Datacenter as an experiment and I have not found any issues by doing so. Now, please don’t take this as recommendation to do so. The smartest and safest thing is to leave it alone. These are not used, so don’t worry about them.

image

But even if you would change those values a cluster resulting from of a cluster operating system rolling upgrade still has other ways of telling it was not born as a Window Server 2016 Cluster.

Under HKLM\Cluster (and Cluster.0) you’ll find the value CusterFunctionaLevel that does not exist on a cleanly installed Windows Server 2016 Cluster (green in the picture above). As you can see this is a Window Server 2016 cluster running at functional level 9.

There is even an extra key OperatingVersion under HKLM\Cluster that you will not find on a cleanly installed cluster either. It also has a Mixed Mode value under that key which indicates whether the cluster is still running in mixed mode or not.

image

Here is a screenshot of newly installed/created Windows Server 2016 cluster. No ClusterFunctionalLevel value, the OS and OSVersion Values are correct and there is no OperatingVersion key to be found.

image

What if you don’t like traces?

First of all, these traces are harmless. One thing you can do if you want to weed out all traces of a rolling upgrade (as far as the cluster is concerned) is to destroy the cluster and create one with the same CNO (and IP address if that was a fixed one). This might all be a bit more involved when it comes to CSV naming and other existing resources but then these remnants will be gone in a supported way. Now this does defeat one of the main purposes of this feature: no down time. The operating system itself might also contain traces if you did in-place OS upgrades but the cluster will not. Just adapting OS/OSVersion, ClusterFunctionalLevel and deleting the key OperatingVersion from HKLM\Cluster (and HKLM\Cluster.0) are not supported actions and messing around in the cluster registry keys can lead to problems, so don’t! The advice is to just leave it all alone. Microsoft developed cluster operating system rolling upgrade the way they did for a reason and by leaving things as Microsoft has set or left them will make sure you are always in a fully supported condition. So, use it if it fits the circumstances & you comply with all the prerequisites. Look at these traces a flag of honor, not a smudge on your shining armor. When I see these artifacts, I see people who have used this feature to their own benefits. Well done I say.

Learn more about the Cluster OS Rolling Upgrade process

Next to my blogs like First experiences with a rolling cluster upgrade of a lab Hyper-V Cluster (Technical Preview) and In Place upgrades of cluster nodes to Windows Server 2016 there are many resources out there by fellow blogger and Microsoft. A great video on the subject is Introducing Cluster OS Rolling Upgrades in Windows Server 2016 with Rob Hindman, who actually works on this feature and knows it inside out.

An important thing to keep in mind is that this can be automated using PowerShell or by leveraging SCVMM for orchestration for example. 3rd party tools could also support this and help you automate this process in order to scale it when needed.

Finally, the official documentation can be found here Cluster operating system rolling upgrade

Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA

Carsten and I dove into our labs and played around with RemoteFX and Discrete Device Assignment in Windows Server 2016 Hyper-V and RDS. This resulted in the Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA.

Some background on RemoteFX & DDA

I’ve discussed the new capabilities in previous blog posts such as https://blog.workinghardinit.work/?s=DDA&submit=Search  and RemoteFX and vGPU Improvements in Windows Server 2016 Hyper-V. But here the Hyper-V Amigos talk about it for your benefit and enjoyment. I for one know we had a ton of fun. Microsoft only VDI solutions are really taking off both on-premises and in Azure in cost conscious environments that still need good performance. I think we’ll see an uptake of such deployments as Microsoft has made some decisions and added some features to make this more feasible.

Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA.

Click this link or the image below to watch Hyper-V Amigos Showcast Episode 14: RemoteFX & DDA

image

There’s a bit of a learning curve associated with using DDA in Windows Server 2016. You’ll have to get acquainted with how to do it and put it to the test in labs and POCs. Do this before you even start thinking about designing production ready solutions. Having a good understanding on how it works and behaves is paramount to success.

Enjoy!

Warning on Windows Server 2016 Deduplication Corruption

UPDATE 2 – 2017/02/06

DO NOT INSTALL KB3216755 if you don’t need it.  Huge memory leak reported to associated with this. If you need it I’d consider all my options.

UPDATE – GET KB3216755

As you can read it the comments, Microsoft reached out and confirms the issues are fixed as part of KB3216755 => https://support.microsoft.com/en-us/help/4011347/windows-10-update-kb3216755 . I commend them for responding so quickly and getting it sorted. Do not that at the time of writing this (late on January 30th CET) the Windows Sever 2016 update isn’t in the Windows Catalog yet, only the Windows 10 ones. But Microsoft confirms you should install the update  on their blog

Windows Server 2016 Data Deduplication users: please install KB3216755!

The issue

Good morning. A quick blog post to give a heads up to my readers who might not be subscribed to Anton Gostev (Veeam) his “The Word Form Gostev”. It concerns a warning on Windows Server 2016 Deduplication corruption.

Warning on Windows Server 2016 Deduplication Corruption

There are multiple reports of data corruption with Windows Server 2016 deduplication. One is related to file sizes over 2TB. The other with the loss of checksum values. Microsoft is aware these issues and a fix is coming for these issues.

I quote Gostev

I’ve already received the official confirmation from Microsoft that this is the know issue (ID 10165851) which is scheduled to be addressed in the next Windows Server 2016 servicing update. There are actually two separate issues, both leading to file corruption when using deduplication on very large files. One issue occurs when files grow to 2.2TB or larger, and another one causes loss of checksums for files with “smaller sizes” – this is the actual wording of the official note, so I have no idea how small

What to do?

If you use Windows Server 2016 deduplication for backups, create new full backups regularly. Also make sure you do backup integrity testing and restore tests. Follow up on the update when it arrives.

If you use the for production data make sure you have frequent and validated backups! Design & operate under the mantra of “Trust but verify”.

Also, we’ve heard reports and noticed that Windows Server 2016 Deduplication resource configuration isn’t always respected. I.e. it can take all resources away despite limitations being set. We hope a fix for this is also under way.

Import of RD Gateway configuration file with policies referencing local resources wipes all policies clean!

Introduction

When you have Windows Server 2016 RD Gateway server and you expect to be able to import a configuration XML file you’ll might find yourself in a pickle when you are also using local resources. Because the import of RD Gateway configuration file with policies referencing local resources wipes all policies clean! With local resources I mean local user accounts and groups. These are leveraged more than I imagined at first.

When does it happen?

In the past I have blogged about migrating RD Gateway servers that contain policies referencing local resources here: Fixing Event ID 2002 “The policy and configuration settings could not be imported to the RD Gateway server “%1” because they are associated with local computer groups on another RD Gateway server”.

We used to be able to use the trick of making sure the local resources exist on the new server (either by recreating them there via the server migration wizard or manually) and changing the server name in the exported configuration XML file  to successfully import the configuration. That no longer works. You get an error.Import of RD Gateway configuration file with policies referencing local resources wipes all policies clean!

As far as migrations go from older versions, they work fins as long as you don’t have policies with local resources. Otherwise you’d better do an in place upgrade or recreate the resources & policies on the new servers. The method described in my blog is not working any more. That’s to bad. But it gets worse.

Import of RD Gateway configuration file with policies referencing local resources wipes all policies clean!

As said,it doesn’t end there. The issue is there even when you try to import the configuration on to the same server you exported it from.That’s really bad as it a quick way to protect against any mistakes you might make, and allows to get back to the original configuration.

What’s even worse, when the import fails it wipes ALL the policies in the RD Gateway Server => dangerous! So yes, the import of RD Gateway configuration file with policies referencing local resources wipes all policies clean!

Precautions

Only a backup or a checkpoint can save your then (or recreate the all manually)! Again this is only when the exported configuration file references local resources! The fasted way to clean out an RD Gateway configuration on Windows Server 2016 is actually importing a configuration export which contains a policy referring to local resource. Ouch! I’m not aware of a fix up to this date.

For now you only protection is a checkpoint or a backup. Depending on where and how you source your virtual machines you might not have access to a checkpoint.

You have been warned, be careful.