Hyper-V Shared Nothing Live Migration In Windows Server 2012– VM Mobility Rules

I see and hear some people shrug at the idea of Shared Nothing Live Migration, dismissing it as marginally useful. Some do state they’ll have it as well but that it’s not that valuable. Well I disagree totally. A lot of the time these remarks are due to a lack of understanding about how several technologies in the Microsoft stack work together. Combine this with tunnel vision and the fear of some vendors and you get a lot of FUD.

I advise you to look beyond the virtualization stack, to the issues that people who are building infrastructure for dynamic, flexible and * cloud  data centers are dealing with.

Look, as “architects” we have to design & build for failure. We all know that it’s just a matter of time before things go BOINK.  So we build in redundancy, some of this within a silo, some of this is between silos. The two approaches compliment each other. What this gives you is options and everybody who knows me, especially those who work  with me has heard my mantras: “Assumptions are the mother of all F* Ups” and “Options, options, options”. Make sure you design & build in options. This way you can maneuver your self out of a bad situation. Don’t ever assume you’re out of options, especially not when you put some in the design on purpose Winking smile. It’s also very useful beyond that because a lot of you might agree with me that silos and fork lift, down time inducing upgrades, migrations, transitions or replacements are expensive and bad. This is where Share Nothing Live Migrations comes into play. You gain mobility over silos. That silo might be a server, a cluster, storage or mixtures of them all.

With Shared Nothing Live Migration we can migrate virtual machines between those silos with nothing more than a network cable.This is huge people. You are no longer trapped in that silo. In this context it provides you with all the options & flexibility mobility gives you. even it the technology itself is not about high availability.

Some very useful scenarios

Migrate virtual machines from an old cluster to a new cluster with out any down time

  1. Migrate virtual machines from stand alone hyper-V hosts to a fail over cluster with out any down time
  2. Migrate virtual machines from one stand alone host to another one for maintenance, again, without any down time
  3. Choose different types if storage & Hyper-V deployment depending in IOPS, redundancy, availability, manageability needs. With Shared Nothing Live Migration you can be confident  that  you can move your virtual machine from one environment to the other when needs change. This is breaking the storage silo boundaries open people! This is huge … think about it.

How it works

The details are for another post but basically is made possible by the combination of Live Storage Migration and Live Migration.

First the Storage is Live Migrated

image

After the Live Storage Migration is done the state of the virtual machines is copied and synchronized.

image

This Is Mobility

I hear the competition shrug.  It isn’t high availability. Well indeed no one who understands the feature ever said it was. It’s virtual machine mobility. Look at the scenarios above and you’ll see that this ability could very well be game changer in how we look at storage & design solutions.

Speed & Performance

What did we hear on this front: “it will be too slow to be really useful”. Really? Well let’s see:

  1. The world is converging to 10Gbps and after that 40Gbps and up will come
  2. NIC Teaming in box With Windows 2012 which can provide more bandwidth.
  3. SMB 3.0 Multichannel. This provides multiple channels per connection spreading the load over multiple CPUs
  4. SMB Direct, have you seen the speeds this achieves?

Before you state that this doesn’t work on Live Migration … as confirmed at TechEd 2012 Europe with Jose Baretto this does work when both the source AND the target is an SMB 3.0 share. This means yet another reason to use SMB 3.0 share for your Hyper-V storage needs! So unlike what Tad at vLimited keeps saying, unhindered by any knowledge, it is a very valuable feature and it can be extremely fast given the right connectivity and storage that can handle the IOPS. And no, the fact that it’s unbuffered doesn’t impact this to much. Test this by using xcopy/robocopy /J with a VHD over your infrastructure.

image

Even if you’re on a budget and cannot go for the RDMA NICs & SMB 3.0 you have several options to get very decent virtual machine mobility and not be stuck in a silo. And for those who want to leverage this feature to create and agile & mobile virtual environment you have some very nice technologies available to optimize to your needs & budgets.

Conclusion

Virtual Machine mobility and storage mobility are very interesting features that provide for a previously unknown flexibility. Windows Server 2012 makes us rethink our storage approaches (I sure am) and I’m very interested in seeing how this will evolve.

Answer to Brad at TechEd Europe 2012 Keynote: Pessimists & Tad Don’t like Windows Server 2012

Brad is on stage for the opening keynote asking if the glass is half full or half empty. Well it depends on where you are in the ecosystem. For us the glass is half full and filling up fast.

Some people nag me about the fact that Windows Server 2012 is so different and that it’s wrong to turn the world upside down. Yes, it is different and new in many ways.  There are also many improvements to features that already exist. There is a lot to learn and understand. Why are some people so pessimistic?

Ever since I got my hands on the BUILD Developer Preview bits I have personally invested a lot of my time in Windows Server 2012. With the beta that only increased. Why? Well, that’s the way forward, because that’s where the improvements are. We can’t do tomorrows jobs and meet tomorrows demands with yesterdays technology.

pessimistsbanner

The picture above is basically the pessimists view of the world. Enjoy your cupper but I’m not joining you. Windows Server 2012 rocks and it’s going to do a whole lot for our industry and businesses. But wait a minute, I do understand why Tad is so pessimistic. But that’s about the future of vLimited and being stuck in the past. Listen Tad, you’d better empty that cup because this is where vLimited becomes history rather than write it.

Does that mean I’ll be throwing away Windows 2008 R2? Nope. I expect to deal a lot with it in the next few years but I’m not going to build future infrastructure on the previous version. I will introduce Windows Server 2012 where and when we benefit from it. For me that is from day one the bits RTM. The benefits are so overwhelming we’d hurt ourselves by not doing it. Your mileage may vary. But don’t get stuck in the past  Here’s a link to your escape pod: Microsoft Virtual Machine Converter Solution Accelerator I’m happy it’s here. That’s what people are asking me more and more about, how to move to Hyper-V.

But what’s with the negativism of some? Sure people are still running Windows Server 2000/2003. Sometimes for good reasons, often for (very) bad ones. Are some going to go through all this again with people clinging to Windows2008 R2? No doubt. Been there, seen it. Very predictable. Is Windows Server 2012 going to fail?  No way.  And what I’m seeing in Windows Server 2012 is great technology. Will it be perfect? No. I already have feature requests for vNext Smile. But this is pushing the ball forward, this is ambitious in the best sense of that word.  There will be bugs, there will be challenges and hiccups. That’s part of the business and the realities of life.  But look at all what’s available in there. Don’t just read some industry press articles. Did you test it your self already? Did you do any clustering? Tested all the new functionality in Hyper-V? The innovations in Live Migration options and networking? Looked at the amount of PowerShell support in there? Notice the improvements in Active Directory, DHCP and other core infrastructure services? Have you used Windows Server 2012 at all yet? You didn’t look at SMB 3.0 and all the storage improvements in there did you? Go talk to Jeff Woolsey, he’s passionate about it and for good reasons. Put in some effort, live a little, get out of your comfort zone and you’ll be going places. Don’t be a pessimist. Think positive or you’ll end up like Tad who was the joke of the party at MMS2012

image

Windows Server 2012 Cluster Reset Recent Events Feature

There are various small improvements in Windows Server 2012 Failover Clustering that make live a little easier. When playing in the lab one of the things I like to do is break stuff. You know, like pull out the power plug  of a host during a live migration or remove a network cable  for one or more of the networks, flip the power of the switch off and on again, crash the vmms.exe process and other really bad things …Smile Just getting a feel for what happens and how Windows 2012 & Hyper-V responds.

As you can imagine this fills up the cluster event logs real fast. It also informs you in that you’ve had issues in the past 24 hours. Those recent cluster events could not be cleared or set to “acknowledged” up to Windows 2008 R2 except by deleting the log files. Now this has to be done on all nodes and is something you should not do in production and is probably even prohibited. There are environments where this is indeed a “resume generating” action. But it’s annoying that you can leave a client with a healthy looking environment after you have fixed an issue.

image

For the lab or environments where event log auditing is a no issue I used to run a little script that would clear the event logs of the lab cluster nodes not to be dealing with to much noise between tests or to leave a GUI that represents the healthy state of the cluster for the customer.

This has become a lot easier and better in Windows Server 2012 we now have a feature for this build in to the Failover Cluster Manager GUI. Just right click the cluster events and select “Reset Recent Events”.

image

 

The good thing is this ignores the recent events before “now” but it does not clear the event log. You can configure the query to show older events again. This is nice during testing in the lab. Even in a production environment where this is a big no-no, you can’t do this you can now get rid of noise from previous issues,focus on the problem you working on or leave the scene with a clean state after fixing an issue without upsetting any auditors.

image

Windows Server 2012 Supports Data Center TCP (DCTCP)

In the grand effort to make Windows Server 2012 scale above and beyond the call of duty Microsoft has been addressing (potential) bottle necks all over the stack. CPU, NUMA, Memory, storage and networking.

Data Center TCP (DCTCP) is one of the many improvements by which Microsoft aims to deliver a lot better network throughput with affordable switches. Switches that can mange large amounts of network traffic tend to have large buffers and those push up the prices a lot. The idea here is that a large buffer creates the ability to deal with burst and prevents congestions. Call it over provisioning if you want.  While this helps it is far from ideal. Let’s say it a blunt instrument.

To mitigate this issue Windows Server 2012 is now capable dealing with network congestion in  a more intelligent way. It does so by reacting to the degree & not merely the presence of congestion using DCTCP. The goals are:

  • Achieve low latency, high burst tolerance, and high throughput, with small buffer switches (read cheaper).
  • Requires Explicit Congestion Notification (ECN, RFC 3168) capable switches. This should be no showstopper you’d think as it’s probably pretty common on most data center / rack switches but that doesn’t seem to be the case for the real cheap ones where this would shine … Sad smile
  • Algorithm enables when it makes sense to do so (low round trip times, i.e. it will be used inside the data center where it makes sense, not over a world wide WAN or internet). 

To see if it is applied run Get-NetTcpConnection:

image

As you can see this is applied here on a DELL PC8024F switch for the CSV and LM networks. The internet connected NIC (connection of the RDP session) shows:

image

Yup, it’s East-West traffic only, not North-South where it makes no sense.

When I was prepping a slide deck for a presentation on what this is, does and means I compared it to the green wave traffic light control. The space between consecutive traffic lights is the buffer and the red light are stops the traffic has to deal with due congestion. This leaves room for a lot of improvement and the way to achieve this is traffic control that intelligently manages the incoming flow so that at every hop there is a green light and the buffer isn’t saturated.

image

Windows Server 2012 in combination with Explicit Congestion Notification (ECN) provides the intelligent traffic control to realize the green wave.

image

The result is very smooth low latency traffic with high burst tolerance and high throughput with cheaper small buffer switches. To see the difference look at the picture   below (from Microsoft BUILD)of what this achieves. Pretty impressive. Here’s a paper by Microsoft Research on the subject

image