An Early Look At Live Migration Over TCP/IP & Multichannel In Windows Server 2012 R2 Preview

Posted on July 11, 2013 by workinghardinit

Introduction

With Windows Server 2012 R2 (Preview) we can Live Migrations over TCP/IP like before. That’s either using a single NIC or by teaming two or more NICs. We also have compression and Multichannel. In this blog post we’ll play with TCP/IP and Multichannel.

We have a dual port 10Gbps Mellanox RDMA card (RoCE) in each host. But for these tests we have disable the RDMA capabilities of these NICs. As in the RMDA blog post, one pair of the ports are interconnected via a direct attach cable. The other one is connected over a Force10 S4810 switch. We’re using in box Windows Server 2012 R2 preview drivers for everything as we have found drivers not to install properly (or not at all) on this release and cause issues.
We are using one VM running Windows 2012 RTM with upgraded Integration Services components. This VM has 4 vCPUs and 55GB of fixed memory assigned. For this purpose we had no workload running in the VM. The servers are standard DELL PowerEdge R720 kit running the Windows Server 2012 Preview bits.

Results

No Performance tweaking

We test a a Live Migration over one 10Gbps NIC. It’s fast but I don’t like the jig saw effect and we don’t push the bandwidth to the limit yet.

We can move the 55GB Memory VM in about 70 seconds on average. You have a bit more CPU load here but nothing to bad. Most often the Hyper-V host has ample of CPU cycles left so this will not hinder performance. I also remember Aidan Finn’s work testing a truck load of concurrent live migrations with a host that has only 1 low end CPU with 4 cores making it throttle the number of CPUs it would start to save guard the workload.

So let’s do what we’ve always done. Turn on Jumbo Frames. This helps to peek to 1.25GB/s and improves speeds (10% or more) but the jig saw is still a bit visible. As I think we can do better we move in the big guns and we optimize our power setting as discussed in Still Need To Optimizing Power Settings On DELL 12th Generation Servers For Lightning Fast Hyper-V Live Migrations? and Optimizing Live Migrations with a 10Gbps Network in a Hyper-V Cluster. Now with C & C1E states disabled and both processor & memory optimized for performance we see this.

Now that’s power. We have faster Live Migrations (54 seconds on average) with top bandwidth use during the entire migration process and we see 50% better blackout times. What’s not to like here? CPU usage isn’t that bad and you’ll likely have some cycles to spare unless you’re over 60-70% of CPU use by your VMs and then you need to fix that anyway Smile as you’re out of the save zone. So, Jumbo Frames & Power Optimization are key!

Of cause we’re always looking for better and more. In Live Migrations terms that means speed. So let’s see what Multichannel can do for us. So let’s switch to SMB. As we have disable RDMA on the NICs this “only” gives us multichannel. The cool thing is, the second NIC doesn’t have Jumbo frames enabled yet. I have always found Jumbo Frames to matter and now with multichannel I have a very nice way of demonstrating / visualizing this. Here’s a screen shot of moving our test VM back and forward. As you can see we have one NIC with Jumbo frames disabled and one with Jumbo frames enabled. You don’t have to guess which one is which I guess. Yup Jumbo frames do matter Smile When you push to the limits. We are getting about 31 seconds on average here with the 55GB VM.

Here’s the same with Jumbo Frames enabled on both NICs. And guess what we just cut another 3 seconds of the Live Migration time Smile . 28 seconds flat.

In a histogram it looks like this. That’s what maximum throughput looks like.

Let’s see what our CPUs are up to during all this. Some core are rather busy dealing with the interrupts. But this is just one VM.

If you wonder why with 2*10Gbps you only see 2*4 CPUs doing work while the default RSS queues are at 8 and you’d expect 16. It’s because Multichannel defaults to 4. So we get 8. This I configurable and testing will show what difference this could make and whether it’s wise to tweak. It all depends.

Sure this is only one large memory VM but what if we do more? Like 6 VMs with 9GB of memory. Not to bad.

What if that host is running 30 or 40 VMs? That adds up. Well that’s what RDMA is for Smile ! But that yet another blog post.

Do keep in mind this is al just the Preview bits … MSFT does two things now until R2 is released. They kill bugs and tweak for speed. I tune my Live Migration setting in production so that get the most bang for the buck I try to avoid dips in bandwidth like you see above. So the work is not finished yet Smile

Conclusion

I can conclude that all the hints & tips of the past to optimize Live Migration still hold true. Yes, you should enable Jumbo Frames and yes you should still optimize your host for performance over power savings. That said, the times that you’d only get 16% of bandwidth usage out of a 10Gbps NIC when you do power optimize have long gone ever since Windows Server 2012. But if you feel the need for (even more) speed …, then by all means go for it.

If you want to conserve energy & be environmentally sound make the most of the least number of nodes possible and use Dynamic Optimization / Power Optimization to shut them down when not needed and fire them up to rise to the occasion Smile

Oh yes, test people, test. Trust but verify and determine the best possible configuration for both your environment and needs.

Now we’ll have a look at compression … but again that’s another blog post!

Configuring Performance Options for Live Migration In Windows Server 2012 R2 Preview

Posted on July 10, 2013 by workinghardinit

New Options For Optimizing Live Migrations

In Windows Server 2012 R2 we have a whole range of options to leverage Live Migration of our environment and needs. Next to the new default (Compression) we can now also leverage SMB 3.0 (Multichannel, RDMA) for all forms of Live Migration and not just for Shared Nothing Live Migration (see Shared Nothing Live Migration Leverages SMB 3.0 Under the Hood) or Storage Live Migration when both the source and the target are SMB 3.0 storage.

TCP/IP

Here you can use a one NIC or a NIC Team for bandwidth aggregation for live migration (see Teamed NIC Live Migrations Between Two Hosts In Windows Server 2012 Do Use All Members). This is the process you have known in Windows Server 2012. You can select multiple NICs or even Teams of NICs but only one of those (one NIC or one Team) will be used. The other(s)will only be used when the first one is not available.

Compression

This option leverages spare CPU cycles to compress the memory contents of virtual machines being migrated. Only then is it sent over the wire via TCP/IP connection. This speeds up the Live Migration Process. This process is CPU load aware so it will only use idle cycles to protect the workload on the hosts. This is the default setting in Hyper-V running on Windows Server 2012 R2 Preview.

SMB

This setting will leverage two SMB 3.0 features. Multichannel and, if supported by and for the NICs involved, RDMA.

SMB Direct (RDMA) will be used when the network adapters of both the source and destination servers have Remote Direct Memory Access (RDMA) capabilities enabled.
SMB Multichannel will automatically detect and use multiple connections when a proper SMB Multichannel configuration is identified.

Where to set these options?

In Hyper-V Manager go to “Hyper-V Settings” in the Actions pane.

Expand the Live Migrations node under Server in the left pane (click the “+”) and select to “Advanced Features”.

Select the option desired under" “Performance Options”.

Happy testing!

EDIT: Aidan Finn posted the PowerShell commands to configure the performance options in Configuring WS2012 R2 Hyper-V Live Migration Performance Options Using PowerShell The MVP community at work & it rocks Smile

Migrating a Hyper-V Cluster to Windows 2012 R2

Posted on July 8, 2013 by workinghardinit

Introduction

We’ll walk through a transition of a Windows Server 2012 Hyper-V Cluster to R2. For this we’ll use the Copy Cluster Roles Wizard (that’s how the Migration Wizard is now called). You have to approaches. You can start with a R2 cluster new hardware and you might even use new storage. For the process in my lab I evicted the nodes one by one and did a node by node migration to the new cluster. How you’ll do this in your environments depends on how many nodes you have, the number of CSVs and what workload they run. This blog post is an illustration of the process. Not a detailed migration plan customized for you environment.

Step by Step

1. Preparing the Target node(s)/cluster

Install the OS, add the Hyper-V role & the Failover cluster feature on the new or the evicted node.You’ll already have some updates to install only a week after the preview bits came available. Microsoft is on top of things, that’s for sure.

Configure the networking for the virtual switch, CSV, LM, management and iSCSI (that’s what I use in the lab) on the OS. Nothing you don’t know yet for this part.

Create a Virtual switch in Hyper-V manager. This is important. If you can you should give it the same name as the ones on the old cluster. Also note that the virtual switch name is case sensitive. If you forget to create one you will not be able to copy the hyper-v virtual machine roles.

Create a new Cluster & configure networking. One of the nice things of R2 is that is intelligent enough to see what type of connectivity suits the networks best and it defaults to that. It sees that the ISCSI network should be excluded for cluster use and that the CSV/LM network doesn’t need to allow client access.

On the ISCSI Target (a W2K12RTM box) I remove the evicted node from the ISCSI initiators and I add the newly installed new cluster node. You can wait to do this out of precaution but the cluster itself does leave the newly detected LUNS off line by default. On top of that it detects the LUNs are in use (reservations) and you can’t even bring them on line.

Right click the cluster and select “Copy Cluster Roles”

Follow the wizard instructions

Click Next and then click Browse to select the source cluster (W2K12)

Select “Warrior” and click OK.

Click Next and see how the connection to the old cluster is being established.

The wizard scans for roles on the old cluster that can be migrated.

In this example the results are the Roles per CSV and the Replication Broker. For migration you can select one of the CSV’s and work LUN per LUN, select multiple of them or even all. It all depends on your needs and environment. I migrated them all over at once in the lab. In real live I usually do only 1 or a couple of interdependent LUNS (for example OS, DATA, LOGS, TempDB on separate CSVs of virtualized SQL Servers).

Note that if you did not specify a virtual switch you will not see any Roles on the CSV listed. So that’s why it’s important defining a virtual switch.

As we did it right we do see all the virtual machines. As we are not migrating to new storage we have to move all VMs on a LUN in one go. That’s because you cannot expose a LUN to multiple clusters.

So we select all the clustered roles and click Next. As we use all capitals in the new virtual switch for the guests we are asked to map it manually. When the names are one on one identical AND in the same case, the mapping happens automatically. You can see this for the virtual switches for guest clustering CSV & ISCSI network connectivity.

We view the report before we click next and see exactly what will be copied.

Close the report and click Next.

Click Next again to kick of the migration.

When don you can view a report of the results. Click finish to close the wizard.

You now have copied your virtual machine cluster roles and the Replication Broker role.

Even the storage is already there in the cluster but off line as it’s not yet available to the new cluster. Remember that your workload is still running on the old cluster so all what you have do until now is a zero down time exercise.

2. Preparing to make the switch on the source cluster node(s)

On the source cluster we shut down all the virtual machines and the Replication Broker role.

We then take the CSVs off line.

If you did not (or could not due to lack of disk space) create a new witness disk you can recuperate this LUN as well. Not ideal I a production migration but dynamic quorum will help you staying protected. Not so with Windows 2008 R2. But you’ll fight as you are. Things are not always perfect.

3. Swapping The Storage From the Source To The Target

We’ll now disconnect the ISCSI initiator from the target on the old cluster node and remove the target.

On the ISCSI target we’ll also remove the old nodes from the list of ISCSI initiators to keep the environment clean. You could wait until you see the VMs up and running on the new environment to facilitate putting the old environment back up if the migration should fail.

Wait until the process has completed and click apply

4. Bringing the new cluster into production

Bring the CSV LUNs on line on the new cluster.

You’ll see then become available / on line in the cluster storage.

You are now ready to start up your virtual machines on the new cluster. With careful planning the down time for the services running in the virtual machines can be kept as low as 10 to 15 minutes depending on how fast you can deal with the storage switch over. Pretty neat.

Conclusion

You can now keep adding new nodes as you evict them from the old cluster. As said the exact scenario will vary based on the workload, number of nodes and CSV in your environment. But this should give you a good head start to work out your migration part. Keep in mind this is all done using the R2 Preview bits. Things are looking pretty good.

In a future blog I’ll discuss some best practices like making sure you have no snapshots on the machines you migrate as this still causes issues like before. At least in this R2 preview.

FIX: iSCSI Initiators Cannot Connect To iSCSI Target Anymore After Upgrading Windows Server 2012 iSCSI Target To Windows Server 2012 R2 Preview

Posted on July 3, 2013 by workinghardinit

I recently did an in place upgrade of my Windows Server 2012 iSCSI Target host in my home lab to Windows Server 2012 R2 Preview. That went well for one “minor” issue. My iSCSI initiators could not connect to it anymore and where in perpetual “connecting mode”.

On the target the status remained in “Not Connected” for all the VHDs.

As you can imagine this sort of ruins the share storage experience on my test Hyper-V cluster. After checking the basics and the event logs I found nothing wrong with the iSCSI Target. It’s then that I turned to the firewall as I suspected that we could have an issue there. Sure enough, the inbound iSCSI rules on the target had not been enabled.

After taking care of that the iSCSI connections succeeded and live was good again in the home lab. You only need to enable the iSCSI target rules.

Working Hard In IT

My view on IT from the trenches

Author Archives: workinghardinit

An Early Look At Live Migration Over TCP/IP & Multichannel In Windows Server 2012 R2 Preview

Introduction

Results

No Performance tweaking

Conclusion

Configuring Performance Options for Live Migration In Windows Server 2012 R2 Preview

New Options For Optimizing Live Migrations

TCP/IP

Compression

SMB

Where to set these options?

Happy testing!

Migrating a Hyper-V Cluster to Windows 2012 R2

Introduction

Step by Step

1. Preparing the Target node(s)/cluster

2. Preparing to make the switch on the source cluster node(s)

3. Swapping The Storage From the Source To The Target

4. Bringing the new cluster into production

Conclusion

FIX: iSCSI Initiators Cannot Connect To iSCSI Target Anymore After Upgrading Windows Server 2012 iSCSI Target To Windows Server 2012 R2 Preview