NIC Teaming in Windows 8 & Hyper-V

One of the many new features in Windows 8 is native NIC Teaming or Load Balancing and Fail Over (LBFO). This is, amongst many others, a most welcome and long awaited improvement. Now that Microsoft has published a great whitepaper (see the link at the end) on this it’s time to publish this post that has been simmering in my drafts for too long. Most of us dealing with NIC teaming in Windows have a lot of stories to tell about incompatible modes depending on the type of teaming, vendors and what other advanced networking features you use.  Combined with the fact that this is a moving target due to a constant trickle of driver & firmware updates to rid of us bugs or add support for features. This means that what works and what doesn’t changes over time. So you have to keep an eye on this. And then we haven’t even mentioned whether it is supported or not and the hassle & risk involved with updating a driver Smile

When it works it rocks and provides great benefits (if not it would have been dead). But it has not always been a very nice story. Not for Microsoft, not for the NIC vendors and not for us IT Pros. Everyone wants things to be better and finally it has happened!

Windows 8 NIC Teaming

Windows 8 brings in box NIC Teaming, also know as Load Balancing and Fail Over (LBFO), with full Microsoft support. This makes me happy as a user. It makes the NIC vendors happy to get out of needing to supply & support LBOF. And it makes Microsoft happy because it was a long missing feature in Windows that made things more complex and error prone than they needed to be.

So what do we get form Windows NIC Teaming

  • It works both in the parent & in the guest. This comes in handy, read on!

image

  • No need for anything else but NICs and Windows 8, that’s it. No 3rd party drivers software needed.
  • A nice and simple GUI to configure & mange it.
  • Full PowerShell support for the above as well so you can automate it for rapid & consistent deployment.
  • Different NIC vendors are supported in the same team.  You can create teams with different NIC vendors in the same host. You can also use different NIC across hosts. This is important for Hyper-V clustering & you don’t want to be forced to use the same NICs everywhere. On top of that you can live migrate transparently between servers that have different NIC vendor setups. The fact that Windows 8 abstracts this all for you is just great and give us a lot more options & flexibility.
  • Depending on the switches you have it supports a number of teaming modes:
    • Switch Independent:  This uses algorithms that do not require the switch to participate in the teaming. This means the switch doesn’t care about what NICs are involved in the teaming and that those teamed NICS can be connected to different switches. The benefit of this is that you can use multiple switches for fault tolerance without any special requirements like stacking.
    • Switch Dependent: Here the switch is involved in the teaming. As a result this requires all the NICs in the team to be connected to the same switch unless you have stackable switches. In this mode network traffic travels at the combined bandwidth of the team members which acts as a as a single pipeline.There are two variations supported.
      1. Static (IEEE 802.3ad) or Generic: The configuration on the switch and on the server identify which links make up the team. This is a static configuration with no extra intelligence in the form of protocols assisting in the detection of problems (port down, bad cable or misconfigurations).
      2. LACP (IEEE 802.1ax, also known as dynamic teaming). This leverages the Link Aggregation Control Protocol on the switch to dynamically identify links between the computer and a specific switch. This can be useful to automatically reconfigure a team when issues arise with a port, cable or a team member.
  • There are 2 load balancing options:
    1. Hyper-V Port: Virtual machines have independent MAC addresses which can be used to load balance traffic. The switch sees a specific source MAC addresses connected to only one connected network adapter, so it can and will balance the egress traffic (from the switch) to the computer over multiple links, based on the destination MAC address for the virtual machine. This is very useful when using Dynamic Virtual Machine Queues. However, this mode might not be specific enough to get a well-balanced distribution if you don’t have many virtual machines. It also limits a single virtual machine to the bandwidth that is available on a single network adapter. Windows Server 8 Beta uses the Hyper-V switch port as the identifier rather than the source MAC address. This is because a virtual machine might be using more than one MAC address on a switch port.
    2. Address Hash: A hash (there a different types, see the white paper mention at the end for details on this) is created based on components of the packet. All packets with that hash value are assigned to one of the available network adapters. The result is that all traffic from the same TCP stream stays on the same network adapter. Another stream will go to another NIC team member, and so on. So this is how you get load balancing. As of yet there is no smart or adaptive load balancing available that make sure the load balancing is optimized by monitoring distribution of traffic and reassigning streams when beneficial.

Here a nice overview table from the whitepaper:

image

Microsoft stated that this covers the most requested types of NIC teaming but that vendors are still capable & allowed to offer their own versions, like they have offered for many years, when they find that might have added value.

Side Note

I wonder how all this is relates/works with to Windows NLB, not just on a host but also in a virtual machine in combination with windows NIC teaming in the host (let alone the guest). I already noticed that Windows NLB doesn’t seem to work if you use Network Virtualization in Windows 8. That combined with the fact there is not much news on any improvements in WNLB (it sure could use some extra features and service monitoring intelligence) I can’t really advise customers to use it any more if they want to future proof their solutions. The Exchange team already went that path 2 years ago. Luckily there are some very affordable & quality solution out there. Kemp Technologies come to mind.

  • Scalability.You can have up to 32 NIC in a single team. Yes those monster setups do exist and it provides for a nice margin to deal with future needs Smile
  • There is no THEORETICAL limit on how many virtual interfaces you can create on a team. This sounds reasonable as otherwise having an 8 or 16 member NIC team makes no sense. But let’s keep it real, there are other limits across the stack in Windows, but you should be able to get up to at least 64 interfaces generally. Use your common sense. If you couldn’t put 100 virtual machines in your environment on just two 1Gbps NICs due to bandwidth concerns & performance reasons you also shouldn’t do that on two teamed 1Gbps NICs either.
  • You can mix NIC of different speeds in the same team. Mind you, this is not necessarily a good idea. The best option is to use NICs of the same speed. Due to failover and load balancing needs and the fact you’d like some predictability in a production environment. In the lab this can be handy when you need to test things out or when you’d rather have this than no redundancy.

Things to keep in mind

SR-IOV & NIC teaming

Once you team NICs they do not expose SR-IOV on top of that. Meaning that if you want to use SR-IOV and need resilience for your network you’ll need to do the teaming in the guest. See the drawing higher up. This is fully supported and works fine. It’s not the easiest option to manage as it’s on a per guest basis instead of just on the host but the tip here is using the NIC Teaming UI on a host to manage the VM teams at the same time.  Just add the virtual machines to the list of managed servers.

image

Do note that teams created in a virtual machine can only run in Switch Independent configuration, Address Hash distribution mode. Only teams where each of the team members is connected to a different Hyper-V switch are supported. Which is very logical, as the picture below demonstrates, because you won’t have a redundant solution.

image

Security Features & Policies Break SR-IOV

Also note that any advanced feature like security policies on the (virtual) switch will disable SR-IOV, it has to or SR-IOV could be used as an effective security bypass mechanism. So beware of this when you notice that SR-IOV doesn’t seem to be working.

RDMA & NIC Teaming Do Not Mix

Now you need also to be aware of the fact that RDMA requires that each NIC has a unique IP addresses. This excludes NIC teaming being used with RDMA. So in order to get more bandwidth than one RDMA NIC can provide you’ll need to rely on Multichannel. But that’s not bad news.

TCP Chimney

TCP Chimney is not supported with network adapter teaming in Windows Server “8” Beta. This might change but I don’t know of any plans.

Don’t Go Overboard

Note that you can’t team teamed NIC whether it is in the host or parent or in virtual machines itself. There is also no support for using Windows NIC teaming to team two teams created with 3rd party (Intel or Broadcom) solutions. So don’t stack teams on top of each

Overview of Supported / Not Supported Features With Windows NIC Teaming

image

Conclusion

There is a lot more to talk about and a lot more to be tested and learned. I hope to get some more labs going and run some tests to see how things all fit together. The aim of my tests is to be ready for prime time when Windows 8 goes RTM. But buyer beware, this is  still “just” Beta material.

For more information please download the excellent whitepaper NIC Teaming (LBFO) in Windows Server "8" Beta

Windows 8 introduces SR-IOV to Hyper-V

We dive a bit deeper into SR-IOV today. I’m not a hardware of software network engineer but this is my perspective on what it is and why it’s valuable addition to the toolbox of Hyper-V in Windows 8.

What is SR-IOV?

SR-IOV stands for Single Root I/O Virtualization. The “Single Root” part means that the PCIe device can only be shared with one system. The Multi Root I/O Virtualization (MR-IOV) is a specification where it can be shared by multiple systems. This is beyond the scope of this blog but you can imagine this being used in future high density blade server topologies and such to share connectivity among systems.

What does SR-IOV do?

Basically SR-IOV allows a single PCIe device to emulate multiple instances of that physical PCIe device on the PCI bus. So it’s a sort of PCIe virtualization. SR-IOV achieves this by using NICs that support this (hardware dependent) by use physical functions (PFs) and virtual functions (VFs). The physical device (think of this a port on a NIC)  is known as a Physical Function (PF) . The virtualized instances of that physical device (that port on our NIC that gets emulated x times) are the Virtual Functions (VF). A PF acts like a full blown PCIe device and is configurable, it acts and functions like a physical device. There is only one PF per port on a physical NIC. VF are only capable of data transfers in and out of devices and can’t be configured or act like real PCIe devices. However you can have many of them tied to one PF but they share the configuration of the PF.

It’s up to the hypervisor (software dependency)  to  assign one or more of these VFs to a virtual Machine (VM) directly. The guest can then use the VF NIC ports via VF driver (so there need to be VF drivers in the integration components) and traffic is send directly (via DMA) in and out of the guest to the physical NIC bypassing the virtual switch of the hyper visor completely. This reduces overhead on CPU load and increases performance of the host and as such also helps with network I/O to and from the guests, it’s as if the virtual machine uses the physical NIC in the host directly. The hyper visor needs to support SR-IOV because it needs to know what PFs and VFs are en how they work.

image

So SR-IOV depends on both hardware (NIC) and software (hypervisor) that supports it. It’s not just the NIC by the way, SR-IOV also needs a modern BIOS with virtualization support. Now most decent to high end server CPUs today support it, so that’s not an issue. Likewise for the NIC.  A modern quality NIC targeted at the virtualization market supports this.  And of cause SR-IOV also needs to be supported by the hypervisor. Until Windows 8, Hyper-V did not support SR-IOV but now it does.

I’ve read in an HP document that you can have 1 to 6 PFs per device (NIC port) and up to 256 “virtual devices” or VF per NIC today. But in reality that might not viable due to the overhead in hardware resources associated with this. So 64 or 32 VFs might be about the maximum but still, 64*2=128 virtual devices from a dual port 10Gbps NIC is already pretty impressive to me. I don’t know what they are for Hyper-V 3.0 but there will be limits to the number of SR-IOV NIC is a server and the number of VFs per core and host but I think they won’t matter to much for most of us in reality. And as technology advances we’ll only see these limits go up as the SR-IOV standard itself allows for more VFs.

So where does SR-IOV fit in when compared to VMQ?

Well it does away with some overhead that still remains with VMQ. VMQ took away the overload of a single core in the host have to be involved in handle all the incoming traffic. But still the hypervisor still has to touch every packet coming in and out. With SR-IOV that issue is addressed as it allows moving data in and out of a virtual machine to the physical NIC via Direct memory Access (DMA). So with this the CPU bottle neck is removed entirely from the process of moving data in and out of virtual machines. The virtual switch never touches it. To see a nice explanation of SR-IOV take a look at the Intel SR-IOV Explanation video on YouTube.

Intel SR-IOV Explanation

VMQ Coalescing tried to address some of the pain of the next bottle neck of using VMQ, which is the large number of interrupts needed to handle traffic if you have a lot of queues. But as we discussed already this functionality is highly under documented and it’s a bit of black art. Especially when NIC teaming and some NIC advanced software issues come in to play. Dynamic VMQ is supposed to take care of that black art and make it more reliable and easier.

Now in contrast to VMQ & RSS that don’t mix together in a Hyper-V environment you can combine SR-IOV with RSS, they work together.

Benefits Versus The Competition

One of the benefits That Hyper-V 3.0 in Windows 8 has over the competition is that you can live migrate to an node that’s not using SR-IOV. That’s quite impressive.

Potential Drawback Of Using SR-IOV

A draw back is that by bypassing the Extensible Virtual Switch you might lose some features and extensions. Whether this is  very important to you depends on your environment and needs. It would take me to far for this blog post but CISCO seems to have enough aces up it’s sleeve to have an integrated management & configuration interface to manage both the networking done in the extensible virtual switch as the SR-IOV NICs. You can read more on this over here Cisco Virtual Networking: Extend Advanced Networking for Microsoft Hyper-V Environments. Basically they:

  1. Extend enterprise-class networking functions to the hypervisor layer with Cisco Nexus 1000V Series Switches.
  2. Extend physical network to the virtual machine with Cisco UCS VM-FEX.

Interesting times are indeed ahead. Only time will tell what many vendors have to offer in those areas & for what type customer profiles (needs/budgets).

A Possible Usage Scenario

You can send data traffic over SR-IOV if that suits your needs. But perhaps you’ll want to keep that data traffic flowing over the extensible Hyper-V virtual switch. But if you’re using iSCSI to the guest why not send that over the SR-IOV virtual function to reduce the load to the host? There is still a lot to learn and investigate on this subject As a little side note. How are the HBAs in Hyper-V 3.0 made available to the virtual machines? SR-IOV, but the PCIe device here is a Fibre HBA not a NIC. I don’t know any details but I think it’s similar.

DVMQ In Windows 8 Hyper-V

VMQ or VMDq

To discuss Dynamic VMQ (DVMQ) we first need to talk about VMQ or VMDq in Intel speak. VMQ lets the physical NIC create unique virtual network queues for each virtual machine (VM) on the host. These are used to pass network packets directly from the hypervisor to the VM. This reduces a lot of overhead CPU core overhead on the host associated with network traffic as it spreads the load over multiple cores. The same sort of CPU overhead you might see with 10Gbps networking on a server that isn’t using RSS (see my previous blog post Know What Receive Side Scaling (RSS) Is For Better Decisions With Windows 8. Under high network traffic one core will hit 100% while the others are idle. This means you‘ll never get more than 3Gbps to 4Gbps of bandwidth out of your 10Gbps card as the CPU is the bottleneck.

VMQ leverages the NIC hardware for sorting, routing & packet filtering of the network packets from an external virtual machine network directly to virtual machines and enables you to use 8gbps or more of your 10Gbps bandwidth.

Now the number of queues isn’t unlimited and are allocated to virtual machines on a first-come, first-served basis. So don’t enable this for machines without heavy network traffic, you’ll just waste queues. It is advised to use it only on those virtual machines with heavy inbound traffic because VMQ is most effective at improving receive-side performance. So use your queues where they make a difference.

If you want to see what VMQ is all about take a look at this video by Intel.

Intel VMDq Explanation

 

The video gives you a nice animated explanation of the benefits. You can think of it as providing the same benefits to the host as RSS does. VMQ also prevents one core being overloaded with interrupts due to high network IO and as such becoming the bottle neck blocking performance. This is important as you might end up buying more servers to handle certain loads due to this. Sure with 1Gbps networking the modern CPUs can handle a very big load but with 10Gbps becoming ever more common this issue is a lot more acute again than it used to be. That’s why you see RSS being enabled by default in Windows 2008 R2.

VMQ Coalescing – The Good & The Bad

There is a  little dark side to VMQ. You’ve successfully relieved the bottleneck on the host for network filtering and sorting but you know have a potential bottle neck where you need a CPU interrupt for every queue. The documentation states as follows:

The network adapter delivers interrupts to the Management Operating system for each VMQ on the processor based processor VMQ affinity. If the interrupts are spread across many processors, the number of interrupts delivered can grow substantially, until the overhead of interrupt handling can outweigh the benefit of using VMQ. To reduce the number of interrupts used, Microsoft has encouraged network adapter manufacturers to design for interrupt coalescing, also called shared interrupts. Using shared interrupts, the network adapter processes multiple queues with the same processor affinity in the same interrupt. This reduces the overall number of interrupts. At the time of this publication, all network adapters that support VMQ also support interrupt coalescing.

Now coalescing works but the configuration and the possible headaches it can give you are material for another blog post.  It can be very tedious and you have to manage every action on your NIC and Virtual Switch configuration like a hawk or you’ll get registry values overwritten, value types changes and such. This leads to all kind of issues, ranging from VMQ coalescing not working to your cluster going down the drain  (worse case). The management of VMQ coalescing seems rather tedious and a such error prone. This is not good. Combine that with the sometime problematic NIC teaming and you get a lot of possible and confusing permutations where things can go wrong. Use it when you can handle the complexity or it might bite you.

Dynamic VMQ For A Better Future

Now bring on Dynamic VMQ (DVMQ). All I know about this is from the Build sessions and I’ll revisit this once I get to test it for real with the beta or release candidate. I really hope this is better documented and doesn’t’ come associated with the issues we’ve had with VMQ Coalescing.  It brings the promise of easy and trouble free VMQ where the load is evenly balanced among the cores and avoids to the burden of to many interrupts. A sort of auto scaling if you like that optimizes queue handling & interrupts.

image

That means it can replace VMQ Coalescing and DVMQ will deal with this potential bottleneck on its own. Due to the issues I’ve had with coalescing I’m looking forward to that one. Take note that you should be able to live migrate virtual machines from host with VMQ capabilities to a host that hasn’t. You do lose the performance benefit but I have no confirmation on this yet and as mentioned I’m waiting for the Beta bits to try it out. It’s like Live Migration between an SR-IOV enabled host and non SR-IOV enabled host, which is confirmed as possible. On that front Microsoft seems to be doing a real good job, better than the competition.

Know What Receive Side Scaling (RSS) Is For Better Decisions With Windows 8

Introduction

As I mentioned in an introduction post Thinking About Windows 8 Server & Hyper-V 3.0 Network Performance there will be a lot of options and design decisions to be made in the networking area, especially with Hyper-V 3.0. When we’ll be discussing DVMQ (see DMVQ In Windows 8 Hyper-V), SR-IOV in Windows 8 (or VMQ/VMDq in Windows 2008 R2) and other network features with their benefits, drawbacks and requirements it helps to know what Receive Side Scaling (RSS) is. Chances are you know it better than the other mentioned optimizations. After all it’s been around longer than VMQ or SR-IOV and it’s beneficial to other workloads than virtualization. So even if you’re a “hardware only for my servers” die hard kind of person you can already be familiar with it. Perhaps you even "dislike” it because when the Scalable Networking Pack came out for Windows  2003 it wasn’t such a trouble free & happy experience. This was due to incompatibilities with a lot of the NIC drivers and it wasn’t fixed very fast. This means the internet is loaded with posts on how to disable RSS & the offload settings on which it depends. This was done to get stability or performance back for application servers like Exchange and others applications or services.

The Case for RSS

But since Windows 2008 these days are over. RSS is a great technology that gets you a lot better usage of out of your network bandwidth and your server. Not using RSS means that you’ll buy extra servers to handle the same workload. That wastes both energy and money. So how does RSS achieve this? Well without RSS all the interrupt from a NIC go to the same CPU/Core in multicore processors (Core 0).  In Task Manager that looks not unlike the picture below:

image

Now for a while the increase in CPU power kept the negative effects at bay for a lot of us in the 1Gbps era. But now, with 10Gbps becoming more common every day, that’s no longer the case. That core will become the bottle neck as that poor logical CPU will be running at 100%, processing as much network interrupts in can handle, while the other logical CPU only have to deal with the other workloads. You might never see more than 3.5Gbps of bandwidth being used if you don’t use RSS. The CPU core just can’t keep up. When you use RSS the processing of those interrupts is distributed across al cores.

With Windows 2008 and Windows 2008 R2 and Windows 8 RSS is enabled by default in the operating system. Your NIC needs to support it and in that case you’ll be able to disable or enable it. Often you’ll get some advanced features (illustrated below) with the better NICs on the market. You’ll be able to set the base processor, the number of processors to use, the number of queues etc. That way you can divide the cores up amongst multiple NICs and/or tie NICs to specific cores.

image

image

So you can get fancy if needed and tweak the settings if needed for multi NIC systems. You can experiment with the best setting for your needs, follow the vendors defaults (Intel for example has different workload profiles for their NICs) or read up on what particular applications require for best performance.

Information On How To Make It Work

For more information on tweaking RSS you take a look at the following document http://msdn.microsoft.com/en-us/windows/hardware/gg463392. It holds a lot more information than just RSS in various scenarios so it’s a useful document for more than just this.

Another good guide is the "Networking Deployment Guide: Deploying High-Speed Networking Features". Those docs are about Windows 2008 R2 but they do have good information on RSS.

If you notice that RSS is correctly configured but it doesn’t seem to work for you it’ might be time to check up on the other adaptor offloads like TCP Checksum Offload, Large Send Offload etc. These also get turned of a lot when trouble shooting performance or reliability issues but RSS depends on them to work. If turned off, this could be the reason RSS is not working for you..