Azure Site Recovery (ASR) supports IAAS managed disks region to region

Introduction

When we see enough progress, not perfection, and get to the point that all our minimal needs are covered is when we decide to adopt a technology, feature or solution as the default. We might even move whole sale, either over time or on an expedited time line.

clip_image002

As more and more companies reach for the cloud we see the offerings mature. That’s when cloud becomes the new normal for a majority. I’m happy to say that with managed disks we are at the point we have not many reasons not to use them. Which means latecomers get a more complete offering “out of the box” and can focus on the next generation of solutions, beyond cloud so to speak, in another wonderfully inadequate term called serverless.

What are IAAS managed disks?

Managed disks provide simpler storage management (no more storage account limits leading to managing and monitoring those accounts) along with better availability, disk level data protection with encryption, RBAC and backups, the ability to create snapshots etc. Clearly, they are the way forward. Read up on them here. I did migrate many virtual machines to them but we could not do this for equally as many despite the clear benefits. Why? Read on!

Azure Site Recovery (ASR) Supports IAAS managed disks region to region

But they had a key piece missing. ASR until last week did not allow to setup Disaster Recovery (DR) for IaaS VMs with managed disks. Those already running everything on managed disks might have found out during a hurricane or flooding scare that they could not quickly set up ASR and move those workloads to another region. I know people who were in that situation.

But as Microsoft announced public availability for the capability to Protect machines using managed disks between Azure regions using Azure Site Recovery. I’m very happy with this because I really like manage disks but this was a real show stopper for the IAAS virtual machines where ASR between regions is a hard requirement. It’s often the case in the quickly evolving cloud environment that features are missing for a while. Those can slow down adoption until they are available.

Now we have a full IAAS solution on par with on-premises VM to Azure IAAS VMs where managed disks are also supported. Which reminds me I need to check if the failback option form Azure to on-premises works already with managed disks (it used to be a one-way street with managed disks). Today, with managed disks I can say we’ve reached the point where we’ll convert the remaining IAAS virtual machines as it covers many needs and we’re confident the remainder of needs will be following.

Progress, not perfection

It’s not perfect yet. We’re still looking forward to encrypted disk support, incremental snapshots etc. But as I said, we decide and work based on progress, not perfection.

Does the DELL VRTX Support Storage Spaces anno 2018?

Some one asked on my blog if the DELL VRTX supported Storage Spaces. It’s 2018 and when I wrote about the VRTX it was mainly as a Cluster in a Box (CiB) solution. This is based on a shared SAS raid controller. The addition of a second controller improved the redundancy (past the write-through requirement as we had in 2014) even though I would really like to see a native in bow redundant network solution here as well. Whether this is suitable for your need is something only you can determine.

Bus as far as a support for Microsoft Shared Storage Spaces or Storage Spaces goes that isn’t there and I would advise against it. A storage controller configuration (pass-through) for the DELL Technologies VRTX series that supported any form of Storage Spaces never came. While with 2 Nodes and the VRTX supporting two storage controller this would theoretically be possible. But with 3 or 4 nodes (The VRXT supports up to 4 nodes) that’s another challenge.

While I have liked the idea and suggested it even as a possible path it has never materialized. If S2D, especially in combination with ReFSv3 or beyond, becomes so immensely popular, they might consider it, but for now it’s not something I see happen and they might very choose other offerings to serve that demand anyway, one with a better design for the separate pass-through capable storage controllers.

As a cluster in a box solution the VRTX does hold merit. As said, I’d love to see a few improvements made to make it fully redundant all in box. With a ruggedized version for industrial or highly mobile environments could make an unbeatable offering.

DISCLAIMER: I don’t work for DELL, I don’t get paid by DELL, I don’t speak for DELL. This is my current independent opinion.

Latency kills

Introduction

I was investigating a very problematic Windows Server 2016 Hyper-V cluster. That cluster was just performing horribly. “Everything” was hanging, stalling, crashing and RHS.exe errors where flying around while WER dumps got created by the dozen. Things were extremely slow up to the points functionality was just failing. The “fun” thing was that the cluster validation wizard while slow gave that cluster a big thumb up and a supported status as all was well.

Prying around

Time to pry around a bit and see if we could find something wrong. We save live migrations stall, fail, last forever in pending or get stuck at a certain percentage, sometimes finally succeeding with ridiculous blackout times. We could not open up virtual machine properties or very slowly. The FCI GUI was highly unresponsive but so was the Hyper-V Manager GUI or even PowerShell. Those were hanging at even loading the virtual machines or enumerating them with Get-VM. Everything was slow to the point it timed out or crashed. Restarting the services (Cluster, Hyper-V) didn’t do anything and restarting VMMS was super slow or just got stuck. It was a depressing sight for which people tended to blame Hyper-V / Microsoft.

As the title gives away it was latency. Not just ordinary high latency. Real bad latency. That kind of latency kills. Extreme latency produces symptoms that are similar to bugs or corrupt components of roles and features. We have a tendency to look at those first in the event logs and then we look at the network and its usual suspects (VMQ, SET, DCB). But nothing pointed to an issue that I could find.

So, storage maybe?Well we did find one Hyper-V host in the cluster with one HBA port producing too many error so we disabled that FC port for testing. No joy the Hyper-V cluster after a clean reboot of all nodes remained problematic. So on to the storage array itself.

Well holy smoke! On the two volumes for CSV in those cluster we saw latencies that were so bad I could not even believe a single VM would boot. It actually made my appreciation for Hyper-V and clustering grow as it managed to do at least a couple of things. With such latencies I would expect the services to just crash & call it a day.

clip_image002

The horrific latency on one of the CSV LUNs.

Looking at the logs we saw that the latencies occurred on the FC HBAs of the controllers. Each one above 50ms, peaking to 150-250ms and one huge peak at almost 500ms. We saw this on all four HBA’s.

clip_image004

The latency on one of the 4 FC HBA’s on one of the controllers. Not a good day. All HBAs had high latencies like this.

The issues were not at the host level (host HBA’s) or not even at the IOPS/bandwidth level of the storage itself. The latency for some reason was spiking. Further investigation lead to the conclusion that the issue was related to synchronous replication going totally wrong. Moving the replication mode to asynchronous fixed that. We’re now investigating why this happened and how to prevent this from happening again. But that’s another story.

clip_image006

Latency on one of the 4 FC HBAs on one of the controllers after we fixed the issue.

Do not assume anything

So, there you go. Everything depends on everything in some direct or indirect way. It’s all connected and that my friends, is why I’m a proponent of “service resilience engineering” where the responsible team owns the entire stack. That’s is how you can act fast.

Cloud & Datacenter Conference Germany 2018

Cloud & Datacenter Conference Germany 2018

The Cloud & Datacenter Conference Germany 2018 is a shining beacon of light in a sea of marketing driven IT events. It is organized by Carsten Rachfahl via his company Rachfahl IT Solutions. Carsten is a Microsoft MVP and Regional Director whose commitment to excellence has show for many years in his community engagements. I think his integrity and style is a driving force behind this conferences ability to attract the quality of attendees, speakers and sponsors.

clip_image002[4]

Cloud & Datacenter welcomes top expert speakers from the community and the industry. They deliver high quality sessions and share their combined experience and knowledge with attendees that are truly interested in working with those technologies. That combination delivers high value interactions and knowledge sharing. The sessions in combination with the interaction between everyone there is works very well due to the size of the conference. Its big enough to have the breath of topics needed I todays IT landscape while it is small enough to allow people to dive in deeper and discuss architecture, design, implementation and visions.

Some extra information

The Cloud & Datacenter Conference Germany 2018 is being held on May 15-16 2018 in the Congress Park Hanau, Scholes Plats 1, Hanau, 63450 Germany. That’s close to Frankfurt and as such has good travel accommodations. Topics of interest will be Azure, Azure Stack, Hybrid Cloud, Private Cloud, Software Defined Datacenter, System Center & O365. The conference is a real-life technology event so no one is pretending that the esoteric future is already here. We are working on that future by building it in our daily job and helping organizations move forward in an efficient and effective manner.

This is a great conference by technologists, for technologists. The opportunities to learn, network and exchange information are great. The speakers are approachable and all of them together are there both share and learn themselves. From my past experiences the organization outstanding and the feedback from attendees was outstanding.

I’ll be speaking on RDMA to give a roadmap on this ever more important technology. On top of that I’ll be around to discuss high availability, clustering, data protection both on premises as in (hybrid) cloud scenarios.

Do your self a favor and register for the Cloud & Datacenter Conference Germany in 2018.

All I can say is that you should really consider attending. It’s most definitely worthwhile. The quality of the attendees, the speakers and the absolute top-notch organization of the conference have been proven in the previous years. The Cloud & Datacenter Conference is a testimony to the professionalism, integrity and quality my fellow MVP and friend Carsten Rachfahl delivers with his company Rachfahl IT solution on a daily basis to his customers. So, help yourself out in your career and register right here. I hope to see you there.

Note: The CDC is German spoken conference but as some speakers are from around the globe you’ll have to listen to some of them speaking in English. If you’ve ever heard my German, I’m sure you’ll prefer me speaking English anyway.