The exceptional value of a great technical community

There is a tremendous value in being an active community member. You learn form other people experiences. Both their successes and their mistakes. They learn from you. All this at the cost of the time and effort you put in. This, by itself, is of great value.

There are moments that this value reaches a peak. It becomes so huge it cannot be dismissed by even the biggest cynic of a penny pinching excuse for a manager.You see, one day bad things happen to even the nicest, most experienced and extremely competent people. That day, in the middle of the night you reach out to your community. The message is basically “HELP!”.

Guess what, the community, spread out across the globe over all time zones answers that call. You get access to support and skills form your peers when you most need it. Even if you have to pay an hourly fee that would still be a magnitude cheaper than many “premium” support schemes that, while very much needed for that vertical support, cannot match the depth and breath of the community.

For sure, you don’t have a piece of paper, and SLA, an escalation manager. That might upset some people. But what you do get are hard core skills, extra eyes and hands when you need it the most. That, ladies and gentlemen, is the exceptional value of a great technical community at work. Your backup when the system fails. Who ever has committed community experts as employees or partners or owners of a business indirectly has access to a global network of knowledge, talent, skills and experience. If you truly think people are the biggest capital you have, than these are the gems.

In Place upgrades of cluster nodes to Windows Server 2016

You will all have heard about rolling cluster upgrades from Windows Server 202 R2 to Windows Server 2016 by now. The best and recommend practice is to do a clean install of any node you want to move to Windows Server 2016. However an in place upgrade does work. Actually it works better then ever before. I’m not recommending this for production but I did do a bunch just to see how the experience was and if that experience was consistent. I was actually pleasantly surprised and it saved me some time in the lab.

Today, if you want to you can upgrade your Windows Server 2012 R2 hosts in the cluster to Windows Server 2016.

The main things to watch out for are that all the VMs on that host have to be migrated to another node or be shut down.

You can not have teamed NICs on the host. Most often these will be used for a vSwitch, so it’s smart and prudent to note down the vSwitch (or vSwitches) name and remove them before removing the NIC team. After you’ve upgraded the node you can recreate the NIC team and the vSwitch(es).

Note that you don’t even have to evict the node from the cluster anymore to perform the upgrade.

image

I have successfully upgrade 4 cluster this way. One was based on PC hardware but the other ones where:

  • DELL R610 2 node cluster with shared SAS storage (MD3200).
  • Dell R720 2 node cluster with Compellent SAN (and ancient 4Gbps Emulex and QLogic FC HBAs)
  • Dell R730 3 node cluster with Compellent SAN (8Gbps Emulex HBAs)

Naturally all these servers were rocking the most current firmware and drives as possible. After the upgrades I upgraded the NIC drivers (Mellanox, Intel) and the FC drivers ‘(Emulex) to be at their supported vendors drivers. I also made sure they got all the available updates before moving on with these lab clusters.

Issues I noticed:

  • The most common issue I saw was that the Hyper-V manager GUI was not functional and I could not connect to the host. The fix was easy: uninstall Hyper-V and re-install it. This requires a few reboots. Other than that it went incredibly well.
  • Another issue I’ve seen with upgrade was that the netlogon service was set to manual which caused various issues with authentication but which is easily fixed. This has also been reported here. Microsoft is aware of this bug and a fixed is being worked on.

 

.

Episode 11 of the Hyper-V Amigo Showcast

Hello there! Carsten Rachfahl and I have a new Hyper-V Amigo showcast out. In this Hyper-V Amigo Showcast Episode 11 we talk about Hyper-converged Storage Spaces Direct in Windows Server 2016 installed.

image

Carsten has a nice demo system to play with and we have some fun playing with it. We’ll see the outstanding performance S2D can deliver and how resilient is. On top of this we also talk about Distributed Storage QoS, Node Isolation and Node Fairness.

The show some crazy performance numbers and even turn 2 of the 4 nodes off. While playing with the Cluster we see also some new Features like Distributed Storage QoS, Cluster and VM Isolation and Node Fairness.

Enjoy!

A First look at Cloud Witness

Introduction

In Windows Server 2012 R2 Failover Clustering we have 2 types of witness:

  1. Disk witness: a shared disk that can be seen by all cluster nodes
  2. File Share Witness (FSW): An SMB 3 file share that is accessible by all cluster nodes

Since Windows Server 2012 R2 the recommendation is to always configure a witness. The reason for this is that thanks to dynamic quorum and dynamic witness. These two capabilities offer the best possible resiliency without administrator intervention and are enabled by default. The cluster dynamically assigns a quorum vote to node when it’s up and removes it when it’s down. Likewise, the witness is given a vote when it’s better to have a witness, if you’re better off without the witness it won’t get a vote. That’s why Microsoft now advises to always set a witness, it will be managed automatically. The result of this is that you’ll get the best possible uptime for a cluster under any given circumstance.

This is still the case in Windows Server 2016 but Failover clustering does introduce a new option witness option: cloud witness.

Why do we need a cloud witness?

For certain scenarios such a cluster without shared storage and especially when a stretched cluster is involved you’ll have to use a FSW. It’s a great solution that works as well as a disk witness in most cases. Why do I say most? Well there is a scenario where a disk witness will provide better resiliency, but let’s not go there now.

Now the caveat here is that you’ll need to place the FSW in a 3rd independent site. That’s a hard order for many to fulfill. You can put in on the desktop of the receptionist at a branch office or on a virtual machine on the cluster itself but it’s “suboptimal”. Ideally the FSW is independent and high available not dependent on what it’s supposed to support in achieving quorum.

One of the other workarounds was to extend AD to Azure, deploy a SOFS Cluster with an non CA file share on a cluster of VMs in Azure and have both other sites have access to it over VPN or express route. That works but in a time of easy, fast, cheap and good solutions it’s still serious effort, just for a file share.

As Microsoft has more and more use cases that require a FSW (site aware stretched clusters, Storage Spaces Direct, Exchange DAG, SQL Availability Groups, workgroup or multi domain clusters) they had to find a solution for the growing number of customers that do not have a 3rd site but do need a FSW. The cloud idea above is great but the implementation isn’t the best as it’s rather complex and expensive. Bar using virtual machines you can’t use Azure file services in the cloud as those are primarily for consumption by applications and the security is done via not via ACLing but access keys. That means the security for the Cluster Name Object (CNO) can’t’ be set. So even when you can expose a cloud file to on premises to Windows 2016 (any OS that supports SMB 3 actually) by mapping it via NET USE the cluster GUI can’t set the required security for the cluster nodes so it will fail. And no you can’t set it manually either. Just to prove this I tried it for you to save you the trouble. Do NOT even go there!

clip_image002

So what is possible? Well come Windows Server 2016 Failover Clustering now has a 3rd type of witness. The cloud witness. Functionally wise it’s like a FSW. The big difference it’s a dedicated, cloud based solution that mitigates the need and costs for a 3rd data center and avoids the cost of the workarounds people came up with.

Implementing the cloud witness

In your Azure subscription you create a storage account, for this purpose I’ve create one named democloudwitness in my resource group RG-Demo. I’m using a separate storage account to keep thing tidy and separated from my other demo storage accounts.

A storage account gets two Access keys and two connection strings. The reason for this is that we you need to regenerate the keys you can have your workloads use the other one this can be done without down time.

clip_image004

In Azure the work is actually already done. The rest will happen on premises on the cluster. We’ll configure the cluster with a witness. In PowerShell this is a one liner.

clip_image006

If you get an error, make sure the information is a correct and you can reach Azure of HTTPS over the internet, VPN or Express Route. You normally do not to use the endpoint parameter, just in the rare case you need to specify a different Azure service endpoint.

The above access key is a fake one by the way, just so you know. Once you’re done Get-ClusterQuorum returns Cloud Witness as QuorumResource.

clip_image008

In the GUI you’ll see

clip_image010

When you open up the Blobs services in your storage account you’ll see that a blob service has been created with a name of msft-cloud-witness. When you select it you’ll see a file with a GUID as the name.

clip_image012

That guid is actually the same as your cluster instance ID that you can find in the registry of your cluster nodes under the HKLM\Cluster key in the string value ClusterInstanceID.

Your storage account can be used for multiple clusters. You’ll just see extra entries each with their own guid.

clip_image014

All this consumes so few resources it’s quite possibly the cheapest ever way of getting a cluster witness. Time will tell.

Things to consider

• Cloud Witness uses the HTTPS REST (NOT SMB 3) interface of the Azure Storage Account service. This means it requires the HTTPS port to be open on all cluster nodes to allow access over the internet. Alternatively an Azure Site-2-Site VPN or Express Route can be used. You’ll need one of those.

• REST means no ACLing for the CNO like on a SMB 3 FSW to be done. Security is handled automatically by the Failover Cluster which doesn’t store the actual access key, but generates a shared access security (SAS) token using the access key and stores it securely.

• The generated SAS Token is valid as long as the access key remains valid. When rotating the primary access key, it is important to first update the cloud witness (on all your clusters that are using that storage account) with the secondary access key before regenerating the primary access key.

• Plan your governance between cluster & Azure admins if these are not the same. I see Azure resources governance being neglected and as a cluster admin it’s nice to have some degree of control or say in the Azure part of the equation.

For completeness I’ll mention that the entire setup of a cloud witness is also very nicely integrated in to the Failover Cluster GUI.

Right click on the desired cluster and select “Configure Cluster Quorum Settings” from menu under “More Actions”

clip_image015

Click through the startup form (unless you’ve never ever done this, then you might want to read it).

clip_image017

Select either “Select the quorum witness” or “Advanced quorum configuration”

clip_image019

We keep the default selection of all nodes.

clip_image021

We select to “Configure a cloud witness”

clip_image023

Type in your Azure storage account name, your primary access key for the “Azure storage account key” and leave the endpoint at its default. You’ll normally won’t need this unless you need to use a different Azure Service Endpoint.

clip_image025

Click “Next”to review what you’re about to do

clip_image027

Click Next again and let the wizard run.

clip_image029

You’ll get a report when it’s done. If you get an error, make sure the information is a correct and you can reach Azure of HTTPS over the internet, VPN or Express Route.

Conclusion

I was pleasantly surprised by how it easy it was to set up a cloud witness. The biggest hurdle for some might be access to Azure in secured environments. The file itself contains no sensitive information at all and while a VPN or Express Route are secured connectivity options this might not be allowed or viable in certain environments. Other than this I have found it to be very reliable, effective cheap and easy. I really encourage you to test it and see what it can do for you.