Import of RD Gateway configuration file with policies referencing local resources wipes all policies clean!

Introduction

When you have Windows Server 2016 RD Gateway server and you expect to be able to import a configuration XML file you’ll might find yourself in a pickle when you are also using local resources. Because the import of RD Gateway configuration file with policies referencing local resources wipes all policies clean! With local resources I mean local user accounts and groups. These are leveraged more than I imagined at first.

When does it happen?

In the past I have blogged about migrating RD Gateway servers that contain policies referencing local resources here: Fixing Event ID 2002 “The policy and configuration settings could not be imported to the RD Gateway server “%1” because they are associated with local computer groups on another RD Gateway server”.

We used to be able to use the trick of making sure the local resources exist on the new server (either by recreating them there via the server migration wizard or manually) and changing the server name in the exported configuration XML file  to successfully import the configuration. That no longer works. You get an error.Import of RD Gateway configuration file with policies referencing local resources wipes all policies clean!

As far as migrations go from older versions, they work fins as long as you don’t have policies with local resources. Otherwise you’d better do an in place upgrade or recreate the resources & policies on the new servers. The method described in my blog is not working any more. That’s to bad. But it gets worse.

Import of RD Gateway configuration file with policies referencing local resources wipes all policies clean!

As said,it doesn’t end there. The issue is there even when you try to import the configuration on to the same server you exported it from.That’s really bad as it a quick way to protect against any mistakes you might make, and allows to get back to the original configuration.

What’s even worse, when the import fails it wipes ALL the policies in the RD Gateway Server => dangerous! So yes, the import of RD Gateway configuration file with policies referencing local resources wipes all policies clean!

Precautions

Only a backup or a checkpoint can save your then (or recreate the all manually)! Again this is only when the exported configuration file references local resources! The fasted way to clean out an RD Gateway configuration on Windows Server 2016 is actually importing a configuration export which contains a policy referring to local resource. Ouch! I’m not aware of a fix up to this date.

For now you only protection is a checkpoint or a backup. Depending on where and how you source your virtual machines you might not have access to a checkpoint.

You have been warned, be careful.

Migrate a Windows Server 2012 R2 AD FS farm to a Windows Server 2016 AD FS farm

Introduction

I recently went through the effort to migrate a Windows Server 2012 R2 AD FS farm to a Windows Server 2016 AD FS farm. For this exercise the people in charge wanted to maintain the server names and IP addresses. By doing so there is no need for changes in the Kemp Technologies load balancer.

Farm Behavior Level Feature

In Windows Server 2016 ADFS we now have a thing called  the Farm Behavior Level (FBL)  feature (FBL). It  determines the features that the AD FS farm can use. Optimistically you can state that the FBL of a Windows Server 2012 R2 AD FS farm is at the Windows Server 2012 R2 FBL.

The FBL feature and mixed mode now makes a “trick” many used to upgrade a ADFS farm to AD FS Windows Server 2012 R2 organizations without the hassle of setting up a new farm and exporting / importing the configuration possible. looking to upgrade to Windows Server 2016 will not have to deploy an entirely new farm, export and import configuration data. Instead, they can add Windows Server 2016 nodes to an existing farm while it is online and only incur the relatively brief downtime involved in the FBL raise.

We can add a secondary Windows Server 2016 AD FS server to a Windows Server 2012 R2 farm. The farm will continue operating at the Windows Server 2012 R2 FBL. This is “mixed mode” so to speak. There is no need to move all the node to the same version immediately.

As long as you are in mixed mode you don’t get the benefits of the new capabilities and features in Windows Server 2016 ADFS. These are not available.

Administrators can add new, Windows Server 2016 federation servers to an existing Windows Server 2012 R2 farm. As a result, the farm is in “mixed mode” and operates the Windows Server 2012 R2 farm behavior level.

When all Windows Server 2012 R2 nodes have been removed form the farm and all nodes are Windows Server 2016 you can raise the FBL level. This results in the new Windows Server 2016 ADFS features being enabled and ready for configuration and use.

The Migration Path Notes

WARNING

You cannot in place upgrade a Windows Server 2012 R2 ADFS Farm node to Windows Server 2016. You will need to remove it from the farm and replace it with a new Windows Server 2016 ADFS node.

Note

In this migration we are preserving the node names and IP addresses. This means the load balancer needed no configuration changes. So in that respect this process is different from what is normally recommended.

This is a WID based deployment example. You can do the same for an SQL based deployment.

The FBL approach is only valid for a migration from Windows Server 2012 AD FS to Windows Server 2016 AD FS. A migration from AD FS 2.0 or 2.1 (Windows Server 2008 R2 or Windows Server 2012) requires the use of the Export-FederationConfiguration and Import FederationConfiguration as before.

Also see https://technet.microsoft.com/en-us/windows-server-docs/identity/ad-fs/overview/ad-fs-2016-requirements

Step by Step

  • Start with the secondary nodes. For each of them make sure you have the server name and the IP configuration.
  • Make sure you have the Service Communications SSL cert for your AD FS and the domain or managed service account name and password.
  • Make sure you have an ADFS configuration backup and also that you have a backup or an export (cool thing about VMs) of the VMs for rapid recovery if needed.

Remove the ADFS role via Server management

image

  • Shut down the VM
  • Edit the setting and remove the OS VHDX. Delete the file (you have a backup/export)
  • Copy your completely patched and syprepped OS VHDX with Windows Server to the location for this VM. Rename that VHDX to something sensible like adfs2disk01.vhdx.
  • Edit the settings and add the new sysprepped OS VHDX to the VM. Make sure that the disk is 1st in the boot order.
  • Start the VM
  • Go through the mini wizard and log in to it.
  • Configure the NIC with the same setting as your old DNS Server
  • Rename the VM to the original VM name and join the domain.
  • Restart the VM
  • Login to the VM and Install ADFS using Add Roles and Features in Server Manager

image

  • When done configure ADFS

image

  • Select to add the node to an existing federation farm

image

  • Make sure you have an account with AD admin permissions

image

  • Tell the node what primary federation server is

image

  • Import your certificate

image

  • Specify the ADFS Service account and its password

image

  • You’re ready to go on

image

  • If any prerequisites don’t work out you’ll be notified, we’re good to go!

image

  • Let the wizard complete all it steps

image

  • When the configuration is done you need to restart the VM to complete adding the node to the ADFS farm.

image

  • Restart your VM and log back in. When you open up ADFS  you’ll see that this new Windows Server 2016 node is a secondary node in your ADFS Farm.

image

  • Note that from a load balancer perspective nothing has changed. They just saw the node go up and down a few times; if they were paying attention at all that is.
  • Now repeat the entire process for all you secondary ADFS Farm nodes. When done we’ll swap the primary node to one of the secondary nodes. This is needed so you can repeat the process for the last remaining node in the farm, which at that time needs to be a secondary node. In the example of our 2 node farm we swap the roles between ADFS1 and ADFS2.

image

  • Verify that ADFS2 is the primary node and if so, repeat the migration process for the last remaining node (ADFS1) in our case.
  • Once that’s been completed we swap them back to have exactly the same situation as before the migration.

image

  • On the primary node run Get-AdfsFarmInformation (a new cmdlet in Windows Server 2016 R2).

image

  • You’ll see that our current farm behavior is 1 and our 2 nodes (all of them Windows Server 2016) are listed. Note that any nodes still on Windows Server 2012 R2 would not be shown.

WARNING: to raise the FBL to Windows Server 2016 your AD Schema needs to be upgraded to at least Windows Server 2016 version 85 or higher. This is also the case form new AD FS farm installations which will be at the latest FBL by default. My environment is already 100% on Windows Server 2016 AD. So I’m good to go. If yours is not, don’t forget to upgrade you schema. You don’t need to upgrade your DC’s unless you want to leverage Microsoft Password authentication, then you need al least 1 Windows Server 2016 domain controller. See https://technet.microsoft.com/en-us/windows-server-docs/identity/ad-fs/overview/ad-fs-2016-requirements

  • As we know all our nodes are on Windows Server 2016 we can raise our Farm Behavior Level  (FBL) by running Invoke-AdfsFarmBehaviorLevelRaise

image

  • Just let it run it has some work to do including creating a new database.

image

  • It will tell you when it’s done and point out changes in the configuration.

image

  • Now run Get-AdfsFarmInformation again

image

  • Note that the  current farm behavior is 3 and our 2 nodes (all of them Windows Server 2016) are listed. Note that  if any nodes had still been on Windows Server 2012 R2 they would have been kicked from the farm and should be removed form the load balancer.

image

PS: with some creativity and by having a look at my blog on https://blog.workinghardinit.work/2016/11/28/easily-migrating-non-ad-integrated-dns-servers-while-preserving-server-names-and-ip-addresses/ You can easily figure out how to add some extra steps to move to generation 2 VMs while you’re at it if you don’t use these yet.

In Place upgrades of cluster nodes to Windows Server 2016

You will all have heard about rolling cluster upgrades from Windows Server 202 R2 to Windows Server 2016 by now. The best and recommend practice is to do a clean install of any node you want to move to Windows Server 2016. However an in place upgrade does work. Actually it works better then ever before. I’m not recommending this for production but I did do a bunch just to see how the experience was and if that experience was consistent. I was actually pleasantly surprised and it saved me some time in the lab.

Today, if you want to you can upgrade your Windows Server 2012 R2 hosts in the cluster to Windows Server 2016.

The main things to watch out for are that all the VMs on that host have to be migrated to another node or be shut down.

You can not have teamed NICs on the host. Most often these will be used for a vSwitch, so it’s smart and prudent to note down the vSwitch (or vSwitches) name and remove them before removing the NIC team. After you’ve upgraded the node you can recreate the NIC team and the vSwitch(es).

Note that you don’t even have to evict the node from the cluster anymore to perform the upgrade.

image

I have successfully upgrade 4 cluster this way. One was based on PC hardware but the other ones where:

  • DELL R610 2 node cluster with shared SAS storage (MD3200).
  • Dell R720 2 node cluster with Compellent SAN (and ancient 4Gbps Emulex and QLogic FC HBAs)
  • Dell R730 3 node cluster with Compellent SAN (8Gbps Emulex HBAs)

Naturally all these servers were rocking the most current firmware and drives as possible. After the upgrades I upgraded the NIC drivers (Mellanox, Intel) and the FC drivers ‘(Emulex) to be at their supported vendors drivers. I also made sure they got all the available updates before moving on with these lab clusters.

Issues I noticed:

  • The most common issue I saw was that the Hyper-V manager GUI was not functional and I could not connect to the host. The fix was easy: uninstall Hyper-V and re-install it. This requires a few reboots. Other than that it went incredibly well.
  • Another issue I’ve seen with upgrade was that the netlogon service was set to manual which caused various issues with authentication but which is easily fixed. This has also been reported here. Microsoft is aware of this bug and a fixed is being worked on.

 

.

A First look at Cloud Witness

Introduction

In Windows Server 2012 R2 Failover Clustering we have 2 types of witness:

  1. Disk witness: a shared disk that can be seen by all cluster nodes
  2. File Share Witness (FSW): An SMB 3 file share that is accessible by all cluster nodes

Since Windows Server 2012 R2 the recommendation is to always configure a witness. The reason for this is that thanks to dynamic quorum and dynamic witness. These two capabilities offer the best possible resiliency without administrator intervention and are enabled by default. The cluster dynamically assigns a quorum vote to node when it’s up and removes it when it’s down. Likewise, the witness is given a vote when it’s better to have a witness, if you’re better off without the witness it won’t get a vote. That’s why Microsoft now advises to always set a witness, it will be managed automatically. The result of this is that you’ll get the best possible uptime for a cluster under any given circumstance.

This is still the case in Windows Server 2016 but Failover clustering does introduce a new option witness option: cloud witness.

Why do we need a cloud witness?

For certain scenarios such a cluster without shared storage and especially when a stretched cluster is involved you’ll have to use a FSW. It’s a great solution that works as well as a disk witness in most cases. Why do I say most? Well there is a scenario where a disk witness will provide better resiliency, but let’s not go there now.

Now the caveat here is that you’ll need to place the FSW in a 3rd independent site. That’s a hard order for many to fulfill. You can put in on the desktop of the receptionist at a branch office or on a virtual machine on the cluster itself but it’s “suboptimal”. Ideally the FSW is independent and high available not dependent on what it’s supposed to support in achieving quorum.

One of the other workarounds was to extend AD to Azure, deploy a SOFS Cluster with an non CA file share on a cluster of VMs in Azure and have both other sites have access to it over VPN or express route. That works but in a time of easy, fast, cheap and good solutions it’s still serious effort, just for a file share.

As Microsoft has more and more use cases that require a FSW (site aware stretched clusters, Storage Spaces Direct, Exchange DAG, SQL Availability Groups, workgroup or multi domain clusters) they had to find a solution for the growing number of customers that do not have a 3rd site but do need a FSW. The cloud idea above is great but the implementation isn’t the best as it’s rather complex and expensive. Bar using virtual machines you can’t use Azure file services in the cloud as those are primarily for consumption by applications and the security is done via not via ACLing but access keys. That means the security for the Cluster Name Object (CNO) can’t’ be set. So even when you can expose a cloud file to on premises to Windows 2016 (any OS that supports SMB 3 actually) by mapping it via NET USE the cluster GUI can’t set the required security for the cluster nodes so it will fail. And no you can’t set it manually either. Just to prove this I tried it for you to save you the trouble. Do NOT even go there!

clip_image002

So what is possible? Well come Windows Server 2016 Failover Clustering now has a 3rd type of witness. The cloud witness. Functionally wise it’s like a FSW. The big difference it’s a dedicated, cloud based solution that mitigates the need and costs for a 3rd data center and avoids the cost of the workarounds people came up with.

Implementing the cloud witness

In your Azure subscription you create a storage account, for this purpose I’ve create one named democloudwitness in my resource group RG-Demo. I’m using a separate storage account to keep thing tidy and separated from my other demo storage accounts.

A storage account gets two Access keys and two connection strings. The reason for this is that we you need to regenerate the keys you can have your workloads use the other one this can be done without down time.

clip_image004

In Azure the work is actually already done. The rest will happen on premises on the cluster. We’ll configure the cluster with a witness. In PowerShell this is a one liner.

clip_image006

If you get an error, make sure the information is a correct and you can reach Azure of HTTPS over the internet, VPN or Express Route. You normally do not to use the endpoint parameter, just in the rare case you need to specify a different Azure service endpoint.

The above access key is a fake one by the way, just so you know. Once you’re done Get-ClusterQuorum returns Cloud Witness as QuorumResource.

clip_image008

In the GUI you’ll see

clip_image010

When you open up the Blobs services in your storage account you’ll see that a blob service has been created with a name of msft-cloud-witness. When you select it you’ll see a file with a GUID as the name.

clip_image012

That guid is actually the same as your cluster instance ID that you can find in the registry of your cluster nodes under the HKLM\Cluster key in the string value ClusterInstanceID.

Your storage account can be used for multiple clusters. You’ll just see extra entries each with their own guid.

clip_image014

All this consumes so few resources it’s quite possibly the cheapest ever way of getting a cluster witness. Time will tell.

Things to consider

• Cloud Witness uses the HTTPS REST (NOT SMB 3) interface of the Azure Storage Account service. This means it requires the HTTPS port to be open on all cluster nodes to allow access over the internet. Alternatively an Azure Site-2-Site VPN or Express Route can be used. You’ll need one of those.

• REST means no ACLing for the CNO like on a SMB 3 FSW to be done. Security is handled automatically by the Failover Cluster which doesn’t store the actual access key, but generates a shared access security (SAS) token using the access key and stores it securely.

• The generated SAS Token is valid as long as the access key remains valid. When rotating the primary access key, it is important to first update the cloud witness (on all your clusters that are using that storage account) with the secondary access key before regenerating the primary access key.

• Plan your governance between cluster & Azure admins if these are not the same. I see Azure resources governance being neglected and as a cluster admin it’s nice to have some degree of control or say in the Azure part of the equation.

For completeness I’ll mention that the entire setup of a cloud witness is also very nicely integrated in to the Failover Cluster GUI.

Right click on the desired cluster and select “Configure Cluster Quorum Settings” from menu under “More Actions”

clip_image015

Click through the startup form (unless you’ve never ever done this, then you might want to read it).

clip_image017

Select either “Select the quorum witness” or “Advanced quorum configuration”

clip_image019

We keep the default selection of all nodes.

clip_image021

We select to “Configure a cloud witness”

clip_image023

Type in your Azure storage account name, your primary access key for the “Azure storage account key” and leave the endpoint at its default. You’ll normally won’t need this unless you need to use a different Azure Service Endpoint.

clip_image025

Click “Next”to review what you’re about to do

clip_image027

Click Next again and let the wizard run.

clip_image029

You’ll get a report when it’s done. If you get an error, make sure the information is a correct and you can reach Azure of HTTPS over the internet, VPN or Express Route.

Conclusion

I was pleasantly surprised by how it easy it was to set up a cloud witness. The biggest hurdle for some might be access to Azure in secured environments. The file itself contains no sensitive information at all and while a VPN or Express Route are secured connectivity options this might not be allowed or viable in certain environments. Other than this I have found it to be very reliable, effective cheap and easy. I really encourage you to test it and see what it can do for you.