Enhanced (Failover) Placement of Virtual Machines on Windows 8 Hyper-V Cluster

One of the nice features in Windows 8 Hyper-V clustering are the “drain node” capability and the virtual machine priorities. You can see this in action in a video by Aidan Finn here.

image

For more details on draining a node see Draining Nodes for Planned Maintenance with Windows Server "8" for a detailed explanation.

Now another very important feature is the fact that the cluster is intelligent enough to determine what node is the best suited as a target during failover or live migration. Not only CPU and memory load of the hosts are taking into consideration but also the resource needs of the VM and the priority you have given those. This entire process is NUMA aware and as such with windows 8 can be evaluated on a per virtual machine basis. This means that you that the cluster will always try to get the best possible placement and thus performance for your virtual machines.

image

 

Now we also have affinity and anti affinity rules. Anti Affinity ensures that the nodes of a virtualized NLB farm will be placed on separate hosts to minimize risk. You don’t want one host to house all the nodes of you NLB farm!

On the other hand sometime you want virtual machines to stick together lets say you have an NLB farm but the virtual machines with the front end and middle tier need  to stay together. In that case you use affinity rules to achieve this. On top of this the anti affinity rules will ensure that the NLB farm virtual machines are on different nodes.

image

Do note that when the cluster has to choose to break these rules versus bot being able to run the virtual machines it will choose to keep them running. It knows its priorities! Now if in such a situation there are not enough resource the priority will also come into play and the low priority machine may be shut down to ensure the higher priority ones can be up and running.

As you can imagine there are potentially a lot of factors/permutations at play here and I’m looking into doing some more test of these features and the intelligence in the process to see if we make the same decisions and how to best configure this for maximal performance & availability.