A First look at Cloud Witness

Introduction

In Windows Server 2012 R2 Failover Clustering we have 2 types of witness:

  1. Disk witness: a shared disk that can be seen by all cluster nodes
  2. File Share Witness (FSW): An SMB 3 file share that is accessible by all cluster nodes

Since Windows Server 2012 R2 the recommendation is to always configure a witness. The reason for this is that thanks to dynamic quorum and dynamic witness. These two capabilities offer the best possible resiliency without administrator intervention and are enabled by default. The cluster dynamically assigns a quorum vote to node when it’s up and removes it when it’s down. Likewise, the witness is given a vote when it’s better to have a witness, if you’re better off without the witness it won’t get a vote. That’s why Microsoft now advises to always set a witness, it will be managed automatically. The result of this is that you’ll get the best possible uptime for a cluster under any given circumstance.

This is still the case in Windows Server 2016 but Failover clustering does introduce a new option witness option: cloud witness.

Why do we need a cloud witness?

For certain scenarios such a cluster without shared storage and especially when a stretched cluster is involved you’ll have to use a FSW. It’s a great solution that works as well as a disk witness in most cases. Why do I say most? Well there is a scenario where a disk witness will provide better resiliency, but let’s not go there now.

Now the caveat here is that you’ll need to place the FSW in a 3rd independent site. That’s a hard order for many to fulfill. You can put in on the desktop of the receptionist at a branch office or on a virtual machine on the cluster itself but it’s “suboptimal”. Ideally the FSW is independent and high available not dependent on what it’s supposed to support in achieving quorum.

One of the other workarounds was to extend AD to Azure, deploy a SOFS Cluster with an non CA file share on a cluster of VMs in Azure and have both other sites have access to it over VPN or express route. That works but in a time of easy, fast, cheap and good solutions it’s still serious effort, just for a file share.

As Microsoft has more and more use cases that require a FSW (site aware stretched clusters, Storage Spaces Direct, Exchange DAG, SQL Availability Groups, workgroup or multi domain clusters) they had to find a solution for the growing number of customers that do not have a 3rd site but do need a FSW. The cloud idea above is great but the implementation isn’t the best as it’s rather complex and expensive. Bar using virtual machines you can’t use Azure file services in the cloud as those are primarily for consumption by applications and the security is done via not via ACLing but access keys. That means the security for the Cluster Name Object (CNO) can’t’ be set. So even when you can expose a cloud file to on premises to Windows 2016 (any OS that supports SMB 3 actually) by mapping it via NET USE the cluster GUI can’t set the required security for the cluster nodes so it will fail. And no you can’t set it manually either. Just to prove this I tried it for you to save you the trouble. Do NOT even go there!

clip_image002

So what is possible? Well come Windows Server 2016 Failover Clustering now has a 3rd type of witness. The cloud witness. Functionally wise it’s like a FSW. The big difference it’s a dedicated, cloud based solution that mitigates the need and costs for a 3rd data center and avoids the cost of the workarounds people came up with.

Implementing the cloud witness

In your Azure subscription you create a storage account, for this purpose I’ve create one named democloudwitness in my resource group RG-Demo. I’m using a separate storage account to keep thing tidy and separated from my other demo storage accounts.

A storage account gets two Access keys and two connection strings. The reason for this is that we you need to regenerate the keys you can have your workloads use the other one this can be done without down time.

clip_image004

In Azure the work is actually already done. The rest will happen on premises on the cluster. We’ll configure the cluster with a witness. In PowerShell this is a one liner.

clip_image006

If you get an error, make sure the information is a correct and you can reach Azure of HTTPS over the internet, VPN or Express Route. You normally do not to use the endpoint parameter, just in the rare case you need to specify a different Azure service endpoint.

The above access key is a fake one by the way, just so you know. Once you’re done Get-ClusterQuorum returns Cloud Witness as QuorumResource.

clip_image008

In the GUI you’ll see

clip_image010

When you open up the Blobs services in your storage account you’ll see that a blob service has been created with a name of msft-cloud-witness. When you select it you’ll see a file with a GUID as the name.

clip_image012

That guid is actually the same as your cluster instance ID that you can find in the registry of your cluster nodes under the HKLM\Cluster key in the string value ClusterInstanceID.

Your storage account can be used for multiple clusters. You’ll just see extra entries each with their own guid.

clip_image014

All this consumes so few resources it’s quite possibly the cheapest ever way of getting a cluster witness. Time will tell.

Things to consider

• Cloud Witness uses the HTTPS REST (NOT SMB 3) interface of the Azure Storage Account service. This means it requires the HTTPS port to be open on all cluster nodes to allow access over the internet. Alternatively an Azure Site-2-Site VPN or Express Route can be used. You’ll need one of those.

• REST means no ACLing for the CNO like on a SMB 3 FSW to be done. Security is handled automatically by the Failover Cluster which doesn’t store the actual access key, but generates a shared access security (SAS) token using the access key and stores it securely.

• The generated SAS Token is valid as long as the access key remains valid. When rotating the primary access key, it is important to first update the cloud witness (on all your clusters that are using that storage account) with the secondary access key before regenerating the primary access key.

• Plan your governance between cluster & Azure admins if these are not the same. I see Azure resources governance being neglected and as a cluster admin it’s nice to have some degree of control or say in the Azure part of the equation.

For completeness I’ll mention that the entire setup of a cloud witness is also very nicely integrated in to the Failover Cluster GUI.

Right click on the desired cluster and select “Configure Cluster Quorum Settings” from menu under “More Actions”

clip_image015

Click through the startup form (unless you’ve never ever done this, then you might want to read it).

clip_image017

Select either “Select the quorum witness” or “Advanced quorum configuration”

clip_image019

We keep the default selection of all nodes.

clip_image021

We select to “Configure a cloud witness”

clip_image023

Type in your Azure storage account name, your primary access key for the “Azure storage account key” and leave the endpoint at its default. You’ll normally won’t need this unless you need to use a different Azure Service Endpoint.

clip_image025

Click “Next”to review what you’re about to do

clip_image027

Click Next again and let the wizard run.

clip_image029

You’ll get a report when it’s done. If you get an error, make sure the information is a correct and you can reach Azure of HTTPS over the internet, VPN or Express Route.

Conclusion

I was pleasantly surprised by how it easy it was to set up a cloud witness. The biggest hurdle for some might be access to Azure in secured environments. The file itself contains no sensitive information at all and while a VPN or Express Route are secured connectivity options this might not be allowed or viable in certain environments. Other than this I have found it to be very reliable, effective cheap and easy. I really encourage you to test it and see what it can do for you.

Azure SQL Database now supports powerful geo-replication features for all service tiers

Business continuity has just gotten a new and improved way of providing consistent access to Azure SQL Server databases. Azure SQL Database now supports powerful geo-replication features for all service tiers. Read that again: for ALL service tiers!

Now, anyone who’s ever worked on providing business continuity knows this is hard to do and get right than it sounds. Whether you use SQL Server databases or other mature and quality ways of persisting data you know from experience that maintaining a consistent copy of your data read/write is a challenge.

The way to deal with that challenge is often to have one read/write master copy and have one or more read only copies. The idea is that during a disaster you’ll be able to serve the majority of your needs with a a read only copy and have at least a basic, good enough service until the issue is fixed or a new read/write copy comes on line. It all sounds great but achieving this and maintaining access to the data during a full blow disaster isn’t easy. You need multiple geographical locations and provide access to them. The latter preferably transparent to the clients. These are case where public cloud computing shines and now more then ever as Azure SQL Database now supports powerful geo-replication features for all service tiers.

image

There’s no more need to upgrade to premium and this capability replaces standard geo replication. Business continuity is rapidly becoming a de facto way of designing and building apps as the cost & complexity blocking this are being torn down.

Cloud & Datacenter Conference Germany

It’s with great joy that I can share that the Cloud & Datacenter Conference Germany website is live and you can now register to attend. My fellow MVP and friend Carsten Rachfahl has realized one of his ambitions to organize a large community driven conference by and for the IT community in the DACH region. But please free to attend if you’re from outside that region, it is open to all and welcomes anyone who wishes to attend. Just note not all sessions will be in English, but in German.

Organizing such an event is not an easy undertaking and I want to applaud Carsten for making this happen. He’s one of Germany’s for most experts and via his company, Rachfahl IT-Solutions, he’s always contributed heavily into the community. Thank you Carsten, you contribute a lot and we appreciate those efforts.

image

I invite you all to attend and join us on May 12 for the first edition of the Cloud & Datacenter Conference Germany in Dusseldorf. It offers more than 25 presentations by top community speakers in five parallel tracks. These tracks cover the entire spectrum of Microsoft technologies available to help you design, build and maintain a state of the art modern IT infrastructure. The conference covers Windows Server 2016, Hyper-V, Microsoft software defined storage, networking, azure stack, System Center, OMS, failover cluster, IaaS, azure, Nano server, PowerShell, containers, and much more.

I’m happy and honored to speak at this conference with so many true real life experts that are part the global community around Microsoft technologies. My presentation will aim to get you briefed on the new and improved functionality in Windows Server 2016 Failover Clustering. In that respect it’s a nice addition to my session What’s new in Failover Clustering in Windows Server 2012 R2. I can only suggest to get up to speed on those as these are still very much valid and I’’ be focusing on the delta between Windows Server 212 R2 and 2016.

The breadth and depth of the technologies available to us cannot be overstated and is still growing. It takes a team effort with both complementary and overlapping expertise to stay on top of things. Education is a huge and important part of daily life for anyone working in IT.

Nowadays when any meeting can be held on-line an in person a conference is still very valuable. It enables you to focus on absorbing the content without being distracted by the realities and interrupts of daily work life. That’s why I still invest in attending conference and I hope you do so as well. When you attend one, be there! That might sound silly but it’s painful to see attendees working remotely and being on the phone all the time. Bart true emergencies that’s a waste of money and effort. Allow yourself or your employees to optimize the ROI of that conference by having them do why they came. Learn, get inspired and network with peers.

Register soon to secure your spot. The price is set a level to make sure it will not be an issue. Sponsoring by companies who have real investment in cloud and datacenter management and benefit from a flourishing well informed ecosystem make this possible.

Azure AD Connect 1.1.105.0 is GA

On February 18th  2016 Microsoft released a significant update to Azure AD Connect, version 1.1.105.0. It adds some capabilities and improves on others. For me this is a core piece of the puzzle today and in the future for many of my plans to optimize the future IT Infrastructure & DevOps. Even when politics seem hell bound on slowing you down and cause a serious delay and missed opportunities this piece of technology is key in breaking through those barriers and keep moving ahead.

So what’s in the box with version  1.1.105.0

  • Automatic upgrade feature for Express settings customers.
    Support for the global admin using MFA and PIM in the installation wizard.
  • We can change the user’s sign-in method after initial install.
  • We can now set Domain and OU filtering in the installation wizard. As a secondary benefit this means we can now connect to forests where not all domains are available.
  • We get a Scheduler is built-in to the sync engine.

Some preview features are now GA:

We get one new preview features which is going to be a hit in world where patience disappeared from the equation:

  • The default sync interval is now 30 minutes instead of 3 hours before. this is configurable now in the scheduler.

It also fixes the following issues:

  • The verify DNS domains page didn’t always recognize the domains.
  • Prompts for domain admin credentials when configuring ADFS.
  • On-premises AD accounts are not recognized by the installation wizard when they are in a domain with a different DNS tree than the root domain.

Grab the latest version of Azure AD Connect here.