Configuring Performance Options for Live Migration In Windows Server 2012 R2 Preview

Posted on July 10, 2013 by workinghardinit

New Options For Optimizing Live Migrations

In Windows Server 2012 R2 we have a whole range of options to leverage Live Migration of our environment and needs. Next to the new default (Compression) we can now also leverage SMB 3.0 (Multichannel, RDMA) for all forms of Live Migration and not just for Shared Nothing Live Migration (see Shared Nothing Live Migration Leverages SMB 3.0 Under the Hood) or Storage Live Migration when both the source and the target are SMB 3.0 storage.

TCP/IP

Here you can use a one NIC or a NIC Team for bandwidth aggregation for live migration (see Teamed NIC Live Migrations Between Two Hosts In Windows Server 2012 Do Use All Members). This is the process you have known in Windows Server 2012. You can select multiple NICs or even Teams of NICs but only one of those (one NIC or one Team) will be used. The other(s)will only be used when the first one is not available.

Compression

This option leverages spare CPU cycles to compress the memory contents of virtual machines being migrated. Only then is it sent over the wire via TCP/IP connection. This speeds up the Live Migration Process. This process is CPU load aware so it will only use idle cycles to protect the workload on the hosts. This is the default setting in Hyper-V running on Windows Server 2012 R2 Preview.

SMB

This setting will leverage two SMB 3.0 features. Multichannel and, if supported by and for the NICs involved, RDMA.

SMB Direct (RDMA) will be used when the network adapters of both the source and destination servers have Remote Direct Memory Access (RDMA) capabilities enabled.
SMB Multichannel will automatically detect and use multiple connections when a proper SMB Multichannel configuration is identified.

Where to set these options?

In Hyper-V Manager go to “Hyper-V Settings” in the Actions pane.

Expand the Live Migrations node under Server in the left pane (click the “+”) and select to “Advanced Features”.

Select the option desired under" “Performance Options”.

Happy testing!

EDIT: Aidan Finn posted the PowerShell commands to configure the performance options in Configuring WS2012 R2 Hyper-V Live Migration Performance Options Using PowerShell The MVP community at work & it rocks Smile

Teamed NIC Live Migrations Between Two Hosts In Windows Server 2012 Do Use All Members

Posted on June 30, 2013 by workinghardinit

Introduction

Between this blog NIC Teaming in Windows Server 2012 Brings Simple, Affordable Traffic Reliability and Load Balancing to your Cloud Workloads which states “TCP/IP can recover from missing or out-of-order packets. However, out-of-order packets seriously impact the throughput of the connection. Therefore, teaming solutions make every effort to keep all the packets associated with a single TCP stream on a single NIC so as to minimize the possibility of out-of-order packet delivery. So, if your traffic load comprises of a single TCP stream (such as a Hyper-V live migration), then having four 1Gb/s NICs in an LACP team will still only deliver 1 Gb/s of bandwidth since all the traffic from that live migration will use one NIC in the team. However, if you do several simultaneous live migrations to multiple destinations, resulting in multiple TCP streams, then the streams will be distributed amongst the teamed NICs” and other information out their such as support forum replies it is dictated that when you live migrate between two nodes in a cluster only one stream is active and you will never exceed the bandwidth of a single team member. When running some simple tests with a 10Gbps NIC team this seems true. We also know that you can consume near to all of the aggregated bandwidth of the members in a NIC Team for live migration if you these conditions are met:

1. The Live Migrations must not all be destined for the same remote machine. Live migration will only use one TCP stream between any pair of hosts. Since both Windows NIC Teaming and the adjacent switch will not spread traffic from a single stream across multiple interfaces live migration between host A and host B, no matter how many VMs you’re migrating, will only use one NIC’s bandwidth.

2. You must use Address Hash (TCP ports) for the NIC Teaming. Hyper-V Port mode will put all the outbound traffic, in this case, on a single NIC.

When we look at these conditions and compare them to the behavior we expect from the various forms of NIC teaming in Windows 2012 this is a bit surprising as one might expect all member to be involved. So let’s take a look at some of the different NIC Teaming setups.

Any form of NIC teaming with Hyper-V Port Mode

This one is easy as condition 2 above is very much true. In all my testing with any NIC team configuration in the Hyper-V Port mode traffic distribution algorithms I have not been able to exceed 10Gbps. I have seen no difference between dependent static of LACP mode or switch independent (active-active) for this condition. As you can see in the screenshot below, the traffic maxes out at 10Gbps.

This is also demonstrated in the following screenshots taking with the resource manager where you can see only half of the bandwidth of the Team is being used.

Exceeding a single NIC team member’s bandwidth when migrating between 2 nodes

The first condition of the previous heading doesn’t seem true. In some easy testing with a low number of virtual machines and not too much memory assigned you never exceed the bandwidth of one 10Gbps NIC team member. So on the surface, with some quick testing it might seem that way.

But during testing on a 2 node cluster with dual port 10Gbps cards and I have found the following

Switch Dependent LACP and Static

Take a sufficient number of large memory virtual machines to exceed the capacity of a single 10Gbps pipe for a longer time (that way you’ll see it in the GUI).
Live migrate them all from host A to host B (“Pause” with “Drain Roles” or “select all” + “Move”)
Note that with a 2 node cluster there is no possibility to Live Migrate to multiple nodes simultaneous. It’s A to or B or B to A or both at the same time.

Basically it didn’t take long to see well over 10Gbpsbeing used. So the information out there seems to be wrong. Yes we can leverage the aggregated bandwidth when we migrate from host A to host B as long as we have enough memory assigned to the VMs and we migrate a sufficient number of them. Switch dependent teaming, whether it is static or LACP does its job as you would expect.

Let’s think about this. The number of VMs you need to lie migrate to see > 10Gbpss used is not fixed in stone. Could it be that there is some intelligence in the Live Migration algorithm where it decides to set up multiple streams when a certain number of virtual machines with sufficient memory are migrated as the sorting is mitigated by the amount of bandwidth that can leveraged? Perhaps he VMMS.EXE kicks off more streams when needed/beneficial? Further experimenting indicates that this is not the case. All you need is > 1 VM being live migrated. When looking at this in task manager you do need them to be of sufficient memory size and/or migrate enough of them to make it visible. I have also tried playing with the number of allowed simultaneous live migrations to see if this has an effect but I did not find one (i.e. 4, 6 or 12).

It looks like it is more like one TCP/IP connection per Live Migration that is indeed tied to one NIC member. So when you live migrate VMS between two hosts you see one VM live migration go over 1 member and the other the other as static/LACP switch dependent teaming did does its job. When you do enough live migrations of large VMs simultaneously you see this in Task Manager as shown below. In this case as each VM live migration stream sticks to a NIC team member you do not need to worry about out of order packets impacting performance.

But to make sure and to prevent falling victim to the fall victim to the limits of the task manger GUI during testing this behavior we also used performance monitor to see what’s going on. This confirms we are indeed using both 10Gbps NIC team member on both the target and the source host server. This is even the case with 2 virtual machines Live Migration. As long as it’s more than one and the memory assigned is enough to make the live migration last long enough you can see it in Task Manager; otherwise it might miss it. Performance Monitor however does not..

This is interesting and frankly a bit unexpected as the documentation on this subject is not reflecting this. However it IS in agreement with the NIC teaming documented behavior for other tan Live Migration traffic. We took a closer look however and can reproduce this over and over again. Again we tested both switch dependent static and LACP modes and we found the behavior to be the same.

Switch Independent with Address Hash

Let’s test Live Migration over switch independent teaming with Address Hash. Here we see that the source server sends on the two member of the NIC team but that the target server receives on only one. This is normal behavior for switch independent teaming. But from the documentation we expect that one member on the source server would send and one member on the target server would receive. Not so.

Basically with Windows Server 2012 this doesn’t give you any benefit for throughput. You are limited to the bandwidth of one member, i.e. 10Gbps.

Red is Total Bytes received on the target host. It’s clear only one member is being used. Green is Bytes Sent/Sec on the source server. As you can see both team members are involved. In a switch independent scenario the receiving side limits the throughput. This is in agreement the documented behavior of switch independent NIC teaming with Address hash.

Helpful documentation on this is Windows Server 2012 NIC Teaming (LBFO) Deployment and Management (A Guide to Windows Server 2012 NIC Teaming for the novice and the expert).

Hope this helps sort out some of the confusion.

I’m Ready To Test Windows Server 2012 R2 Live Migration Over Multichannel & RDMA!

Posted on June 10, 2013 by workinghardinit

I’m so ready for the first Windows 2012 R2 preview bits. Yes that’s what our current setup looks like. Two RDMA capable NICs at the ready Winking smile … let the bits come. I’m pretty excited to test Live Migration over Multichannel & RDMA.

We choose to get RDMA NICs for new servers and replace non RDMA card in host where there’s a clear benefit. By the way have you seen the news on the Mellanox ConnectX®-3 Pro Single/Dual-Port Adapters => NIC support for NVGRE is here people!

Still Need To Optimizing Power Settings On DELL 12th Generation Servers For Lightning Fast Hyper-V Live Migrations?

Posted on June 10, 2013 by workinghardinit

Do you remember my blog from 2011 on optimizing some system settings to get way better Live Migration performance with 10Gbps NICs? It’s over here Optimizing Live Migrations with a 10Gbps Network in a Hyper-V Cluster. This advice still holds true, but the power optimization settings & interaction between DELL Generation 12 Server and with Windows Server 2012 has improved significantly. Where with Windows Server 2008 R2 we could hardly get above 16% bandwidth consumption out of the box with Live Migration over a 10Gbps NIC today this just works fine.

Don’t believe me? You do now? A cool Winking smile

For overall peak system performance you might want to adjust your Windows configuration settings to run the High Performance preferred power plan, if that’s needed.

You might do no longer need to dive into the BIOS. Of cause if you have issues because your hardware isn’t that intelligent and/or are still running Windows 2008 R2 you do want to there. As when it comes to speed we want it all and we want it now Smile and than you still want to dive into the BIOS and tweak it even on the DELL 12th Generation hardware. Test & confirm I’d say but you should notice a difference, all be it smaller than with Windows Server 2008 R2.

Well let’s revisit this again as we are now no longer working with Generation 10 or 11 servers with an “aged” BIOS. Now we have decommissioned the Generation 10 server, upgrade the BIOS of our Generation 11 Servers and acquired Generation 12 servers. We also no use UEFI for our Hyper-V host installations. The time has come to become familiar with those and the benefits they bring. It also future proofs our host installations.

So where and how do I change the power configuration settings now? Let’s walk through one together. Reboot your server and during the boot cycle hit F2 to enter System Setup.

Select System BIOS

Click on System Profile Settings

The settings you want to adapt are:

CPU Power Management should be on Maximum Performance
Setting Memory Frequency to “Maximum Performance”
C1E states should be disabled
C states should be disabled

That’s it. The below configuration has optimized your power settings on a DELL Generation 12 server like the R720.

When don, click “Back” and than Finish. A warning will pop up and you need to confirm you want to safe your changes. Click “Yes” if you indeed want to do this.

You’ll get a nice confirmation that your settings have been saved. Click “OK” and then click Finish.

Confirm that you want to exit and reboot by clicking yes and voila, when the server comes back on it will be running a full speed at the cost of more power consumption, extra generated heat and cooling.

Remember, if you don’t need to run at full power, don’t. And if you consider using Dynamic Optimization and Power Optimization in System Center Virtual Machine Manager 2012. Save a penguin!

Working Hard In IT

My view on IT from the trenches

Category Archives: Live Migration