High performance live migration done right means using SMB Direct

I  saw people team two 10GBps NICs for live migration and use TCP/IP. They leveraged LACP for this as per my blog Teamed NIC Live Migrations Between Two Hosts In Windows Server 2012 Do Use All Members . That was a nice post but not a commercial to use it. It was to prove a point that LACP/Static switch dependent teaming did allow for multiple VMs to be live migrated in the same direction between two node. But for speed, max throughput & low CPU usage teaming is not the way to go. This is not needed as you can achieve bandwidth aggregation and redundancy with SMB via Multichannel. This doesn’t require any LACP configuration at all and allows for switch independent aggregation and redundancy. Which is great, as it avoids stacking with switches that don’t do  VLT, MLAG,  …

Even when your team your NICs your better off using SMB. The bandwidth aggregation is often better. But again, you can have that without LACP NIC teaming so why bother? Perhaps one reason, with LACP failover is faster, but that’s of no big concern with live migration.

We’ll do some simple examples to show you why these choices matter. We’ll also demonstrate the importance of an optimize RSS configuration. Do not that the configuration we use here is not a production environment, it’s just a demo to show case results.

But there is yet another benefit to SMB.  SMB Direct.  That provides for maximum throughput, low latency and low CPU usage.

LACP NIC TEAM with 2*10Gbps with TCP

With RSS setting on the inbox default we have problems reaching the best possible throughput (17Gbps). But that’s not all. Look at the CPU at the time of live migration. As you can see it’s pretty taxing on the system at 22%.


If we optimize RSS with 8 RSS queues assigned to 8 physical cores per NIC on a different CPU (dual socket, 8 core system) we sometimes get better CPU overhead at +/- 12% but the throughput does not improve much and it’s not very consistent. It can get worse and look more like the above.


LACP NIC TEAM with 2*10Gbps with SMB (Multichannel)

With the default RSS Settings we still have problems reaching the best possible throughput but it’s better (19Gbps). CPU wise, it’s pretty taxing on the system at 24%.


If we optimize RSS with 8 RSS queues assigned to 8 physical cores per NIC on a different CPU (dual socket, 8 core system) we get better over CPU overhead at +/- 8% but the throughput actually declined (17.5 %). When we run the test again we were back to the results we saw with default RSS settings.


Is there any value in using SMB over TCP with LACP for live migration?

Yes there is. Below you see two VMs live migrate, RSS is optimized. One core per VM is used and the throughput isn’t great, is it. Depending on the speed of your CPU you get at best 4.5 to 5Gbps throughput per VM as that 1 core per VM is the limiting factor. Hence see about 9Gbps here, as there’s 2 VMs, each leveraging 1 core.


Now look at only one VM with RSS is optimized with SMB over an LACP NIC team. Even 1 large memory VM leverages 8 cores and achieves 19Gbps.


What about Switch Independent Teaming?

Ah well that consumes a lot less CPU cycles but it comes at the price of speed. It has less CPU overhead to deal with in regards to LACP. It can only receive on one team member. The good news is that even a single VM can achieve 10Gbps (better than LACP) at lower CPU overhead. With SMB you get better CPU distribution results but as the one member is a bottle neck, not faster. But … why bother when we have …better options!? Read on Smile!

No Teaming – 2*10Gbps with SMB Multichannel, RSS Optimized

We are reaching very good throughput but it’s better (20Gbps) with 8 RSS queues assigned to 8 physical cores. The CPU at the time of live migration is pretty good at 6%-7%.


Important: This is what you want to use if you don’t have 10Gbps but you do have 4* 1Gbps NICs for live migration. You can test with compression and LACP teaming if you want/can to see if you get better results. Your mirage may vary Smile. If you have only one 1Gbps NIC => Compression is your sole & only savior.

2*10Gbps with SMB Direct

We’re using perfmon here to see the used bandwidth as RDMA traffic does not show up in Task Manager.


We have no problems reaching the best possible throughput but it’s better (20Gbps, line speed). But now look at the CPU during live migration. How do you like them numbers?

Do not buy non RDMA capable NICs or Switches without DCB support!

These are real numbers, the only thing is that the type and quality of the NICs, firmware and drivers used also play a role an can skew the results a bit. The onboard LOM run of the mill NICs aren’t always the best choice. Do note that configuration matters as you have seen. But SMB Direct eats them all for breakfast, no matter what.

Convinced yet? People, one of my core highly valuable skillsets is getting commodity hardware to perform and I tend to give solid advice. You can read all my tips for fast live migrations here in Live Migration Speed Check List – Take It Easy To Speed It Up

Does all of this matter to you? I say yes , it does. It depends on your environment and usage patterns. Maybe you’re totally over provisioned and run only very small workloads in your virtual machines. But it’s save to say that if you want to use your hardware to its full potential under most circumstances you really want to leverage SMB Direct for live migrations. What about that Hyper-V cluster with compute and storage heavy applications, what about SQL Server virtualization? Would you not like to see this picture with SMB RDMA? The Mellanox  RDMA cards are very good value for money. Great 10Gbps switches that support DCB (for PFC/ETS) can be bought a decent prices. You’re missing out and potentially making a huge mistake not leveraging SMB Direct for live migrations and many other workloads. Invest and design your solutions wisely!

I Can’t Afford 10GBps For Hyper-V And Other Lies

You’re wrong

There, I said it. Sure you can. Don’t think you need to be a big data center to make this happen. You just need to think and work outside the box a bit and when you’re not a large enterprise, that’s a bit more easy to do. Don’t do it like a big name brand, traditionalist partner would do it (strip & refit the entire structural cabling in the server room, high end gear with big margins everywhere). You’re going for maximum results & value, not sales margins and bonuses.

I would even say you can’t afford to stay on 1Gbps much longer or you’ll be dealing with the fall out of being stuck in the past. Really some of us are already look at > 10Gbps connections to the servers, actually. You need to move from 1Gbps or you’ll be micro managing a way around issues sucking all the fun out of your work with ever diminishing results and rising costs for both you and the business.

Give your Windows Server 2012R2 Hyper-V environment the bandwidth it needs to shine and make the company some money. If all you want to do is to spent as little money as possible I’m not quite sure what your goal is? Either you need it or you don’t.  I’m convinced we need it. So we must get it. Do what it takes. Let me show you one way to get what you need.

Sounds great what do I do?

Take heart, be brave and of good courage! Combine it with skills, knowledge & experience to deliver a 10Gbps infrastructure as part of ongoing maintenance & projects. I just have to emphasize that some skills are indeed needed, pure guts alone won’t do it.

First of all you need to realize that you do not need to rip and replace your existing network infrastructure. That’s very hard to get approval for, takes too much time and rapidly becomes very expensive in both dollars and efforts. Also, to be honest, quiet often you don’t have that kind of pull. I for one certainly do not. And if I’d try to do that way it takes way too many meetings, diplomacy, politics, ITIL, ITML & Change Approval Board actions to make it happen. This adds to the cost even more, both in time and money. So leave what you have in place, for this exercise we assume it’s working fine but you can’t afford to have wait for many hours while all host drains in 6 node cluster and you need to drain all of them to add memory. So we have a need (OK you’ll need a better business case than this but don’t make to big a deal of it or you’ll draw unwanted attention) and we’ve taking away the fear factor of fork lift replacing the existing network which is a big risk & cost.

So how do I go about it?

Start out as part of regular upgrades, replacement or new deployments. The money is their for those projects. Make sure to add some networking budget and leverage other projects need to support the networking needs.

Get a starter budget for a POC of some sort, it will get your started to acquire some more essential missing  bits.

By reasonably cheap switches of reasonable port count that do all you need. If they’re readily available in a frame work contract, great. You can get it as part of the normal procedures. But if you want to nock another 6% to 8% of the cost order them directly from the vendor. Cut out the middle man.

Buy some gear as part of your normal refresh cycle. Adapt that cycle life time a bit to suit your needs where possible. Funding for operation maintenance & replacement should already be in place right?

Negotiate hard with your vendor. Listen, just like in the storage world, the network world has arrived at a point where they’re not going to be making tons of money just because they are essential. They have lots of competition and it’s only increasing. There are deals to be made and if you chose the right hardware it’s gear that won’t lock you into proprietary cabling, SPF+ modules and such. Or not to much anyway Smile.

Design options and choices

Small but effective

If you’re really on minimal budget just introduce redundant (independent) stand alone 10Gbps switches for the East-West traffic that only runs between the nodes in the data center. CSV, Live Migration, backup. You don’t even need to hook it up to the network for data traffic, you only need to be able to remotely manage it and that’s what they invented Out Off Band (OOB) ports for. See also an old post of mine Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4). In the smallest cheapest scenario I use just 2 independent switches. In the other scenario build a 2 node spine and the leaf. In my examples I use DELL network gear. But use whatever works best for your needs and your environment. Just don’t go the “nobody ever got fired for buying XXX” route, that’s fear, not courage! Use cheaper NetGear switches if that fits your needs. Your call, see my  recent blog post on this 10Gbps Cheap & Without Risk In Even The Smallest Environments.

Medium sized excellence

First of all a disclaimer: medium sized isn’t a standardized way of measuring businesses and their IT needs. There will be large differences depending on you neck of the woods Smile.

Build your 10Gbps infrastructure the way you want it and aim it to grow to where it might evolve. Keep it simple and shallow. Go wide where you need to. Use the Spine/Leaf design as a basis, even if what you’re building is smaller than what it’s normally used for. Borrow the concept. All 10Gbps traffic, will be moving within that Spine/Leaf setup. Only client server traffic will be going out side of it and it’s a small part of all traffic. This is how you get VM mobility, great network speeds in the server room avoiding the existing core to become a bandwidth bottleneck.

You might even consider doing Infiniband where the cost/Gbps is very attractive and it will serve you well for a long time. But it can be a hard sell as it’s “another technology”.

Don’t panic, you don’t need to buy a bunch of Nexus 7000’s  or Force10 Z9000 to do this in your moderately sized server room. In medium sized environment I try to follow the “Spine/Leaf” concept even if it’s not true ECMP/CLOSS, it’s the principle. For the spine choose the switches that fit your size, environment & growth. I’ve used the Force10 S4810 with great success and you can negotiate hard on the price. The reasons I went for the higher priced Force10 S4810 are:

  • It’s the spine so I need best performance in that layer so that’s where I spend my money.
  • I wanted VLT, stacking is a big no no here. With VLT I can do firmware upgrades without down time.
  • It scales out reasonably by leveraging eVLT if ever needed.

For the ToR switches I normally go with PowerConnect 81XX F series or the N40XXF series, which is the current model. These provide great value for money and I can negotiate hard on price here while still getting 10Gbps with the features I need. I don’t need VLT as we do switch independent NIC teaming with Windows. That gives me the best scalability wit DVMQ & vRSS and allows for firmware upgrades without any network down time in the rack. I do sacrifice true redundant LACP within the rack but for the few times I might really need to have that I could go cross racks & still maintain a rack a failure domain as the ToRs are redundant. I avoid stacking, it’s a single point of failure during firmware upgrades and I don’t like that. Sure I can could leverage the rack a domain of failure to work around that but that’s not very practical for ordinary routine maintenance. The N40XXF also give me the DCB capabilities I need for SMB Direct.

Hook it up to the normal core switch of the existing network, for just the client/server.(North/South) traffic. I make sure that any VLANs used for CSV, live migration, can’t even reach that part of the network.  Even data traffic (between virtual machines, physical servers) goes East-West within your Spine/Leave and never goes out anyway unless you did something really weird and bad.

As said, you can scale out VLT using eVLT that creates a port channel between 2 VLT domains. That’s nice. So in a medium sized business you’re pretty save in growth. If you grow beyond this, we’ll be talking about a way larger deployment anyway and true ECMP/CLOS and that’s not the scale I’m dealing with where. For most medium sized business or small ones with bigger needs this will do the job. ECMP/CLOS Spine/leaf actually requires layer 3 in the design and as you might have noticed I kind if avoid that. Again, to get to a good solution today instead of a real good solution next year which won’t happen because real good is risky and expensive. Words they don’t like to hear above your pay grade.

The picture below is just for illustration of the concept. Basically I normally have only one VLT domain and have two 10Gbps switches per rack. This gives me racks as failure domains and it allows me to forgo a lot of extra structural cabling work to neatly provide connectivity form the switches  to the server racks .image

You have a  scalable, capable & affordable 10Gbps or better infrastructure that will run any workload in style.. After testing you simply start new deployments in the Spine/Leaf and slowly mover over existing workloads. If you do all this as part of upgrades it won’t cause any downtime due to the network being renewed. Just by upgrading or replacing current workloads.

The layer 3 core in the picture above is the uplink to your existing network and you don’t touch that. Just let if run until there nothing left in there and you can clean it up or take it out. Easy transition. The core can be left in place or replaces when needed due to age or capabilities.

To keep things extra affordable

While today the issues with (structural) 10Gbps copper CAT6A and NICs/Switches seem solved, when I started doing 10Gbps fibre cabling of Copper Twinax Direct Attach was the only way to go. 10GBaseT wasn’t an option yet and I still love the flexibility of fibre, it consumes less space and weighs less then CAT6A. Fibre also fits easily in existing cable infrastructure. Less hassle. But CAT6A will work fine today, no worries.

If you decide to do fibre, buy OM3, you can get decent, affordable cabling on line. Order it as consumable supplies.

Spend some time on the internet and find the SFP+ that works with your switches to save a significant amount of money. Yup some vendor switches work with compatible non OEM branded SPF+ modules. Order them as consumable supplies, but buy some first to TEST! Save money but do it smart, don’t be silly.

For patch cabling 10Gbps Copper Twinax Direct Attach works great for short ranges and isn’t expensive, but the length is limited and they get thicker & more sturdy and thus unwieldy by length. It does have it’s place and I use them where appropriate.

Isn’t this dangerous?

Nope. Technology wise is perfectly sound and nothing new. Project wise it delivers results, fast, effective and without breaking the bank. Functionally you now have all the bandwidth you need to stop worrying and micromanaging stuff to work around those pesky bandwidth issues and focus on better ways of doing things. You’ve given yourself options & possibilities. Yay!

Perhaps the approach to achieve this isn’t very conventional. I disagree. Look, anyone who’s been running projects & delivering results knows the world isn’t that black and white. We’ve been doing 10Gbps for 4 years now this way and with (repeated) great success while others have to wait for the 1Gbps structural cabling to be replaced some day in the future … probably by 10Gbps copper in a 100Gbps world by the time it happens. You have to get the job done. Do you want results, improvements, progress and success or just avoid risk and cover your ass? Well then, choose & just make it happen. Remember the business demands everything at the speed of light, delivered yesterday at no cost with 99.999% uptime.  So this approach is what they want, albeit perhaps not what they say.

10Gbps Cheap & Without Risk In Even The Smallest Environments

Over the last 18 months cheaper, commodity, small port count, but high quality 10Gbps switches have become available. NetGear is a prime example. This means 10Gbps networking is within reach for even the smallest deployments.

Size is an often used measure for technological needs like storage, networking and compute but in many cases it’s way too blunt of a tool. A lot of smaller environments in specialized niches need more capable storage  and networking capacities than their size would lead you to believe. The “Enterprise level” cost associated with the earlier SPF+ based swithes was an obstacle especially since the minimum port count lies around 24 ports, so with switch redundancy this already means 2 *24 ports.  Then there’s the cost of vendor branded SPF+ modules. But that could be offset with Copper Twinax Direct Attach cabling (which have their sweet spots for use) or finding functional cheaper non branded SFP+ modules. But all that isn’t an issue anymore. Today 10GBase-T card & switches are readily available and ready for prime time. The issues with power consumption and heat have been dealt with.

While vendors like DELL have done some amazing work to bring affordable 10Gbps switches to the market it remained a obstacle for many small environments. Now with the cheaper copper based, low port count switches it’s become a lot easier to introduce 10Gbps while taking away the biggest operational pains.

  • You can start with a lower number of 10Gbps ports (8-12) instead of  a minimum of 24.
  • No need for expensive vendor branded SPF+ modules.
  • Copper cabling (CAT6A) is relatively cheap for use in a rack or between two racks and for this kind of environment using patch lead cables isn’t an issue
  • Power consumption and heat challenges of copper 10Gbps has been addressed.


So even for the smallest setups where people would love to get 10Gbps for live migrations, hypervisor host backups and/or the virtual network it can be done now. If you introduce these for just CSV, live migration, storage or backup networks you can even avoid having to integrate them into the data network. This makes it easier, non disruptive & the isolation helps puts minds at easy about potential impacts of extra traffic and misconfigurations. Still you take away the heavy loads that might be disrupting your 1Gbps network, making things well again without needing further investments.

So go ahead, take the step and enjoy the benefits that 10Gbps bring to your (virtual) environment. Even medium sized shops can use this as a show case while they prepare for a 10Gbps upgrade for the server room or data center in the years to come.

Still Need To Optimizing Power Settings On DELL 12th Generation Servers For Lightning Fast Hyper-V Live Migrations?

Do you remember my blog from 2011 on optimizing some system settings to get way better Live Migration performance with 10Gbps NICs?  It’s over here Optimizing Live Migrations with a 10Gbps Network in a Hyper-V Cluster. This advice still holds true, but the power optimization settings & interaction between DELL Generation 12 Server and with Windows Server 2012 has improved significantly. Where with Windows Server 2008 R2 we could hardly get above 16% bandwidth consumption out of the box with Live Migration over a 10Gbps NIC today this just works fine.

Don’t believe me?image You do now? A cool Winking smile

For overall peak system performance you might want to adjust your Windows configuration settings to run the High Performance preferred power plan, if that’s needed.image

You might do no longer need to dive into the BIOS. Of cause if you have issues because your hardware isn’t that intelligent and/or are still running Windows 2008 R2 you do want to there. As when it comes to speed we want it all and we want it now Smile and than you still want to dive into the BIOS and tweak it even on the DELL 12th Generation hardware. Test & confirm I’d say but you should notice a difference, all be it smaller than with Windows Server 2008 R2.

Well let’s revisit this again as we are now no longer working with Generation 10 or 11 servers with an “aged” BIOS. Now we have decommissioned the Generation 10 server,  upgrade the BIOS of our Generation 11 Servers and acquired Generation 12 servers. We also no use UEFI for our Hyper-V host installations. The time has come to become familiar with those and the benefits they bring. It also future proofs our host installations.

So where and how do I change the power configuration settings now? Let’s walk through one together. Reboot your server and during the boot cycle hit F2 to enter System Setup.image

Select System BIOSimage

Click on System Profile Settingsimage

The settings you want to adapt are:

  • CPU Power Management should be on Maximum Performance
  • Setting Memory Frequency to “Maximum Performance”
  • C1E states should be disabled
  • C states should be disabled


That’s it. The below configuration has optimized your power settings on a DELL Generation 12 server like the R720.image

When don, click “Back” and than Finish. A warning will pop up and you need to confirm you want to safe your changes. Click “Yes” if you indeed want to do this.image

You’ll get a nice confirmation that your settings have been saved. Click “OK” and then click Finish.image

Confirm that you want to exit and reboot by clicking yes and voila, when the server comes back on it will be running a full speed at the cost of more power consumption, extra generated heat and cooling.image

Remember, if you don’t need to run at full power, don’t. And if you consider using  Dynamic Optimization and Power Optimization in System Center Virtual Machine Manager 2012. Save a penguin!