NIC Firmware/Driver Updates Reset Your RSS/VMQ optimizations

When optimizing your RSS/VMQ settings for maximum performance you’ll normally (I hope) do this in PowerShell. Save that script with some comments on why you configure it that way and make it part of your Hyper-V host deployment scripts

Why? Automation is king but you’ll need it again for sure. Why? Well there is this “tendency” that NIC firmware/driver updates reset your RSS/VMQ optimizations back to their defaults.That’s a bit of a bummer if you have to redo all the work instead of having a script ready to go. I have seen many a deployment where the configuration was missing after firmware/driver upgrades so please, check!

image

Figure: Where has my optimized configuration gone after a driver/firmware upgrade?

The good news is this isn’t a show stopper issue as things will keep working, but without your optimizations and with VMQ, depending on your NIC team setup for the vSwitch issues might occur. When doing NIC teaming for your virtual switch it’s important to get it right.  With switch dependent teaming (LACP/Static) the NICs in the team need to use overlapping processor sets (Min Queues). When doing switch independent teaming the NICs in the team need to use non-overlapping processor sets. So you need to configure each NIC in your team to use the different processors (Sum of Queues).

On top of that you might want to / should separate RSS/VMQ cores from each other. SMB Direct for CSV/LM will also help achieve this as there we leverage CPU offloading to the NIC.

Musings On Switch Embedded Teaming, SMB Direct and QoS in Windows Server 2016 Hyper-V

When you have been reading up on what’s new in Windows Server 2016 Hyper-V networking you probably read about Switch Embedded Teaming (SET). Basically this takes the concept of teaming and has this done by the vSwitch. Which means you don’t have to team at the host level. The big benefit that this opens up is the RDMA can be leveraged on vNICs. With host based teaming the RDMA capabilities of your NICs are no longer exposed, i.e. you can’t leverage RDMA. Now this has become possible and that’s pretty big.

clip_image001

With the rise of 10, 25, 40, 50 and 100 Gbps NICs and switches the lure to go fully converged becomes even louder. Given the fact that we now don’t lose RDMA capabilities to the vNICs exposed to the host that call sounds only louder to many.  But wait, there’s even more to lure us to a fully converged solution, the fact that we now do no longer lose RSS on those vNICs! All good news.

I have written an entire whitepaper on convergence and it benefits, drawback, risks & rewards. I will not repeat all that here. One point I need to make that lossless traffic and QoS are paramount to the success of fully converged networking. After all we don’t want lossy storage traffic and we need to assure adequate bandwidth for all our types of traffic. For now, in Technical Preview 3 we have support for Software Defined Networking (SDN) QoS.

What does that mean in regards to what we already use today? There is no support for native QoS  and vSwitch QoS in Windows Server 2016 TPv3. There is however the  mention of DCB (PFC/ETS ), which is hardware QoS in the TechNet docs on Remote Direct Memory Access (RDMA) and Switch Embedded Teaming (SET). Cool!

But wait a minute. When we look at all kinds of traffic in a converged Hyper-V environment we see CSV (storage traffic), live migration (all variations), backups over SMB3 all potentially leveraging SMB Direct. Due to the features and capabilities in SMB3 I like that. Don’t get me wrong about that. But it also worries me a bit when it comes to handling QoS on the hardware side of things.

In DCB Priority Flow Control (PFC) is the lossless part, Enhanced Transmission Selection (ETS) is the minimum bandwidth QoS part. But how do we leverage ETS when all types of traffic use SMB Direct. On the host it all gets tagged with the same priority. ETS works by tagging different priorities to different workloads and assuring minimal bandwidths out of a total of 100% without reserving it for a workload if it doesn’t need it. Here’s a blog post on ETS with a demo video DCB ETS Demo with SMB Direct over RoCE (RDMA .

Does this mean a SDN QoS only approach to deal with the various type of SMB Direct traffic or do they have some aces up their sleeves?

This isn’t a new “concern” I have but with SET and the sustained push for convergence it does has the potential to become an issue. We already have the SMB bandwidth limitation feature for live migration. That what is used to prevent LM starving CSV traffic when needed. See Preventing Live Migration Over SMB Starving CSV Traffic in Windows Server 2012 R2 with Set-SmbBandwidthLimit.

Now in real life I have rarely, if ever, seen a hard need for this. But it’s there to make sure you have something when needed. It hasn’t caused me issues yet, but I’m a performance & scale first, in “a non-economies of scale” world compared to hosters. As such convergence is a tool I use with moderation. My testing when traffic competes without ETS is that they all get part of the cake but not super predictable/ consistent. SMB bandwidth limitation is a bit of a “bolted on” solution => you can see the perf counters push down the bandwidth in an epic struggle to contain it, but as said it’s a struggle, not a nice flat line.

Also Set-SmbBandwidthLimit is not a percentage, but hard max bandwidth limit, so when you lose a SET member the math is off and you could be in trouble fast. Perhaps it’s these categories that could or will be used but it doesn’t seem like the most elegant solution/approach. That with ever more traffic leveraging SMB Direct make me ever more curious. Some switches offer up to 4 lossless queues now so perhaps that’s the way to go leveraging more priorities … Interesting stuff! My preferred and easiest QoS tool, get even bigger pipes, is an approach convergence and evolution of network needs keeps pushing over. Anyway, I’ll be very interested to see how this is dealt with. For now I’ll conclude my musings On Switch Embedded Teaming, SMB Direct and QoS in Windows Server 2016 Hyper-V

Trunking With Hyper-V Networking

When doing lab work, or real life implementations you’ll need to go beyond the basic 101 stuff to build solutions every now and then. This is especially true when using virtual network appliances. Networking means you’ll you’ll be dealing with Link Aggregation Groups, Trunking, MLAG, routing, LACP … in short the tools of the trade when doing networking. In my experience I use trunking in Hyper-V mostly to mimic real world scenarios where trunking is used (firewall, routers, load balancers). These tend to be limited in usable ports in real life. So even, before you run out of physical ports on your Hyper-V host to work with we leverage them to mimic the real live environment. This leads us to trunking with Hyper-V networking

I for one have used this on 10Gbps ports on bot physical and virtual load balancers in the uplink to the switches. As you can imagine when doing redundant (teaming) cabling with HA load balancers you’re consuming 10Gbps ports and not all VLANs warrant a dedicated 10Gbps uplink, even if you had ‘m.

Trunking & VLAN’s are the way we deal with this in the network hardware world and we can do the same in Hyper-V. In the Hyper-V Manager GUI you will not find a way to define a trunk on an vNIC attached to a vSwitch. But this can be done via PowerShell. So please do not reject Hyper-V as not being up to the job. It is. Let me show you how you can do trunking with Hyper-V networking.

Generally on a clean install I dump the default vNIC. DO NOT DO this blindly on an existing deployed appliance virtual machine.

#Delete the default network adapter
Remove-VMNetworkAdapter -VMName VLM200-1 -Name "Network Adapter"

I then add the number of ethernet ports I need on my Kemptechnologies virual Load Master.

#Create the VLM200 ports (4 like it's physical counterpart)
For ($Count=0; $Count -le 3; $Count ++)
{
Add-VMNetworkadapter -VMName VLM200-1 -Name "Eth$Count"
}

A peak at our handy work via Get-VMNetworkAdapter -VMName VLM200-1 shows our 4 ports.

image

As you can see I like to name my network adapters with a distinctive name. In combination with the switch name it enables me to identify the NICs better. Combine that with a good naming policy inside the VM if possible. In Windows Server 2016 you can hot add and remove vNICs and new “Device Naming”

(see Hot add/remove of network adapters and enabling device naming in Windows Server Hyper-V) functionality which only makes the experience better in relation to uptime and automation.

Now let’s say we use eth0 for management and for the HA heartbeat. That leaves Eth2 and Eth3 for workloads. We could even aggregate these (redundancy, heart beat). In this demo we’ll configure Eth3 as a trunk with a list of allowed VLANs. We keep the native VLAN ID on 0 as it is by default. Only in specific situations where you have changed this in the network should this be changed.

#Trunk Eth3 and add the required VLAnIDs
Set-VMNetworkAdaptervlan -VMName VLM200-1 -VMNetworkAdapterName "Eth3"-Trunk -AllowedVlanIdList "10, 20, 30" -NativeVlanId 0

Which delivers us what we need to get our network appliance going

image

In your virtual appliance you can now create VLANs on Eth3. How this shows up is dependent on the appliance. In this example a Kemp Virtual Load Master. Here we mimic a 4 port load master. We’re not doing trunking because we ran out of the max supported number of NICs we can add to a virtual machine.

image

A word of warning. You will not see this configuration in the settings via the GUI.
Manipulating the VLAN settings in the GUI will overwrite the settings without a warning.
So be careful with configuration of your virtual network appliance(s).  As an example I’ll touch the VLAN setting of Eth3 and give it VLAN 500.

image

We now have a look at our VLAN settings of the appliance

image

That vNIC is now in Access mode with VLAN 500. Ouch, that will seriously ruin your day in production! Be careful!

On top of this some appliances do not respond well to such misconfigurations on the switch side (both physical and virtual switches). This leads not only to service interruption but could lead to the inability to mange the appliance, requiring a reboot of them etc.

Anyway, so yes you can do trunking with Hyper-V networking on a vNIC but this normally only makes sense I you have an appliance running that knows what to do with a trunk such as a virtual  firewall, router or load balancer.

Virtual Network Appliances I Use for Hyper-V Labs

When you build and maintain a test lab you’re always on the lookout for gear you can use. That’s either hardware or virtual appliances. My main concern is cost, it should work well on Hyper-V and the ability to mimic real world environments. That’s a great help for educational purposes as well as for testing and as an aid to troubles shooting. One of the nice things virtualization and now also cloud IAAS offers is the ability to run virtual storage and network appliances that allow us to have that real world look and feel. Add to that ever more software defined storage, networking and compute and we’re able to build very realistic labs. The limits we’re left with are time, money and space.

When building a lab some people tend to run into perceived limitations of their hypervisor. That’s to be expected as for many that hypervisor is just something to quickly get up and running an get to work writing code, implementing a backup solution or whatever the workload at hand is all about. The tip here is not to give up to fast.

More recently I’m build/working on a new lab setup simulating different sites. I need to route between these isolated test networks and load balance traffic in a site redundant manner. The idea was to mimic real life as well as we good. Add to that lab setup an Azure “site” and it’s fun all over. It’s all based on Hyper-V and Windows Server virtual machines but some components are not. Windows NLB has had its best day and RRAS is limited in the abilities I need to test. They can and do work fine for certain scenarios, but not for all that I need to test. I add virtual load balancers, virtual switches with the look and feel of physical ones and the same for virtual firewalls.

Now in real life you’ll be dealing with Link Aggregation Groups, Trunking, MLAG, routing, teaming … in short the tools of the trade when doing networking. One side effect of this is that on a Hyper-V host you quickly run out of physical network ports to work with. That’s not a problem, in real life your firewall or load balancer does not have 48 ports either. Often you have 4 to 8 and sometimes more, but often not, ports at your disposal and depending on the complexity that’s more than enough or not at all. Trunking & VLAN’s are the way we deal with this. In the Hyper-V GUI you will not find a way to define a trunk on an vNIC attached to a vSwitch. But this can be done via PowerShell. So please do not reject Hyper-V as not being up to the job. It is! Read about this in my blog post.

People often ask me what virtual network appliances I Use for Hyper-V Labs. This does vary over time, but there are some constants. In the lab I hate wasting time on time bombed trials. So I avoid those in favor of either fully featured solutions or I use free open source alternatives. Smart vendors provide the easiest access possible to their solutions. They realize that easy access delivers the ability to learn and test every aspect of the products which make a huge difference in the success of their offerings in the real world. When it comes to load balancers I use the KEMP Virtual Load Masters. You can read more about these in projects and lab testing  in blogs about the KEMP (Virtual) Load Master.

As an MVP I got 1 free license. Together with the ability to restore configurations I can have a pseudo permanent redundant load balancing setup. Only building labs for multi-site geo load balancing solutions requires to start from scratch every time. For routing I use VyOS, it works on both hardware and on a bunch of hypervisors with X64 bit virtual machines. When I need the look and feel of a firewall you’ll encounter in business I use Opnsense. It supports the synthetic vNICs with the enlightened Hyper-V drivers. Yup, the integration components are there.  It doesn’t boot from UEFI so no Generation 2 virtual machine support as of yet. imageimage

Another good one is IPFire. This one also does a nice job with the integration components.

image

I also have a DELL SonicWall in my home office where I have some ports to play with but it tends to be leveraged more for the permanent parts of the lab. It’s a crucial & permanent component.

SonicWALL NSA 220 Wireless-N Appliance