Don’t Let Live Migration Settings Behavior Confuse You

Configuring live migration settings on a cluster

In the cluster under Networks, Live Migration Settings you can select what networks are available for live migration and their order of preference.image

image

Basically this setting determines what NICs can be used. It also determines and in what order of preference the available networks can be used by Live Migration. It does not determine bandwidth aggregation or failover. All it does is provide the order in which the redundant networks will be used. It’s up to the cluster service NETFT, Multichannel or SMB Direct to provide the bandwidth aggregation if possible As you can see we use LM/CSV over SMB and as our two NICS are RDMA capable 10Gbps NIC, multichannel will discover RDMA capabilities & leverage SMB Direct if it can be establish otherwise it will just stick with multichannel. If you would  team that NIC shows up as just one network. Also not that if you lose a NIC during live migration it might fail for some VMs under certain scenarios, but you cluster nodes will maintain the capability & recover. The  names of the network reflect this: LM1+CSV2 & CSV1+LM2 will be used both but if for some reason multichannel goes completely south the names reflect the metrics of these networks. The lowest is CSV1+LM2 and the second lowest is LM1+CSV2, reflect on how NETFT will select to use which automatically based on the metrics. I.e. It’s “self documentation” for human consumption.

image

Sometimes you might get surprising results. An example. If you’ve selected SMB for Live Migration and you have selected only one of the NICs here.image Still when you look at perform you might see both being used! What’s happening is that multichannel will kick in (and use two or more similar NICs when it finds them and if applicable move to RDMA.

image

So here we select SMB for the live migration type and the two equally capable 10GBps NICs available for live migration it will use them, even if you selected only one of them in the cluster network settings for live migration.

Why is that? Well, there is still another location where live migration settings are defined and that is in Hyper-V Manager. Take a look at the screenshot below.

image

The Incoming live migration settings is set to “Use any available network for live migration”. If you have this on it will still leverage both as when one is used multichannel drag the other one into action anyway, no matter what you set in network settings for live migration in the Cluster GUI (it set to use only one and dimmed out).

Do note that on Hyper-V Manager the settings for live migrations specify “Incoming live migrations”. That leads us to believe that it’s the target, the node where the VMs are migrated to that determines what NICs get used. But let’s put this to the test.

Testing A Hyper-V cluster with two nodes – Node A and Node B

On the cluster network settings you select only one network.

image

On  Hyper-V cluster Node A you have configured the following for live migration via Hyper-V Manager to “Use these IP addresses for live migration”. You cannot add or remove networks, the networks used are defined by the cluster.

image

On  Hyper-V cluster Node B you have configured the following for live migration via Hyper-V Manager to “Use any available network for live migration”.

As we now kick of a live migration from node A to node B we’ll see both NICs being used. Why well because Node B is the target and Node B has the setting “Use any available network for live migration”. So why then only these 2, why not pick up any other suitable NICs? Well we’ve configured the live migration on both nodes to use SMB. As this cluster is RDMA capable that means it will leverage multichannel/SMB direct. The auto configuration will select the best, equally capable NICs for this and that’s these two in our scenario. Remember the capabilities of the NICs have to match. So no mixtures of 1 * 1Gbps and 1 *10Gbps or 1 * multichannel and 1 SMB Direct.

The confusion really sets in that even if live migrate from Node B to A it will also use both NICs! Hum, that is “Incoming Live Migrations” is not always “correct” it seems, not when using SMB as a performance option at least. Apparently multichannel will kick in in both directions.

If you set both to Node A and Node B to “Use these IP addresses for live migration” and leave the cluster network setting with only one network it does only use one, even with SMB as a performance option for live migration. This is expected.

Note: I had one interesting hiccup while testing this configuration: when doing the latter one of the VMs failed in live migration of the entire host. I ran it again and that one VM still used both networks. All others went well during host migration with just one being used. That was a bit of a huh Confused smile moment and it sure tripped me up & kept me busy for a while. I blame RDMA and the position of the planets & constellations.

Things aren’t always what they seem at first and it’s good to keep that in mind. The moment you think you got if figured out, you’re wrong Winking smile. So look again & investigate.

2014 Kicks Offs With A Microsoft MVP Award Renewal!

I can start of the new year at the office tomorrow by whishing everyone a great 2014 and talking to the CEO about making sure we schedule/budget for the MVP Summit in 2014 as I just got this e-mail in:

image

This is great news. Somewhere this month, a box will arrive at my house with a new ring for the award trophy and a new MVP Certificate. The trophy itself stands proudly on a cupboard in my parents living room. I might have earned the award, but they made sure I was given the little backpack with the values & the education to make that possible. And while they don’t really understand the technicalities of what I do, they think the world of it when I fly of to the USA because Microsoft likes to talk to me Smile.

While you get the MVP award for contributions over the past year to the community you must realize that being active in the community is a very rewarding endeavor in itself. No, it’s not like you can put a hard dollar amount on it, nor want to. The value is that you help out a lot of people and in return effectively crowd source your IT issues to a global community. It help other and yourself learn and grow. We constantly need to figure out a lot of things in our line of business. Studying, thinking, analyzing, designing, implementing & supporting a wide ever changing IT stack is not a one person or one team job. Sharing insights & experiences (both good and bad) with your peers helps a lot. You grow a global network of highly motivated, experienced & skilled people. A network you can leverage & consult when you need it. A network that you contribute to when you can, where you can. That’s what it’s about. The cost of participating is dwarfed by the return as many hand help make hard work easier. Good bosses & smart organizations have figured this out. During a panel discussion at Dell World Jonathan Copeland used the metaphor that it take a village to raise a kid. I added to it that one day that kid needs to leave the village and go out into the world. Being active in communities helps with both. I’ve also talked about this here.

I’m very happy with my 2014 MVP Award. I cherish the interaction with my fellow MVPs and Microsoft. The 2014 MVP Global Summit is already on my agenda. I’m grateful for the opportunities it brings and the continuous privilege of learning & sharing. See you all on line, at conferences, think tanks & community events & in Bellevue/Redmond.