One of the cool new features that takes scalability in Windows Server 2012 R2 Hyper-V to a new level is virtual Receive Side Scaling (vRSS). While since In Windows Server 2012, Receive Side Scaling (RSS) over SR-IOV is supported it’s best suited for some specialized environments that require the best possible speeds at the lowest possible latencies. While SR-IOV is great for performance it’s not as flexible as for example you can’t team them so if you need redundancy you’ll need to do guest NIC Teaming.
vRSS is supported on the VM network path (vNIC, vSwitch, pNIC) and allows VMs to scale better under heavier network loads. The lack of RSS support in the guest means that there is only one logical CPU (core 0) that has to deal with all the network interrupts. vRSS avoid this bottleneck by spreading network traffic among multiple VM processors. Which is great news for data copy heavy environments.
What do you need?
Nothing special, it works with any NICs that supports VMQ and that’s about all 10Gbps NICs you can buy or posses. So no investment is needed. It’s basically the DVMQ capability on the host NIC that has VMQ capabilities that allows for vRSS to be exposed inside of the VM over the vSwitch. To take advantage of vRSS, VMs must be configured to use multiple cores, and they must support RSS => turn it on in the vNIC configuration in the guest OS and don’t try to use a home PC 1Gbps card
vRSS is enabled automatically when the VM uses RSS on the VM network path. The other good news is that this works over NIC Teaming. So you don’t have to do in guest NIC Teaming.
What does it look like?
Now without SR-IOV it was a serious challenge to push that 10Gbps vNIC to it’s limit due to all the interrupt handling being dealt with by a single CPU core. Here’s what a VMs processor looks like under a sustained network load without vRSS. Not to shabby, but we want more
As you can see the incoming network traffic has the be dealt with by good old vCore 0. While DVMQ allows for multiple processors on the host dealing with the interrupts for the VMs it still means that you have a single core per VM. That one core is possibly a limiting factor (if you can get the network throughput and storage IO, that is). vRSS deals with this limitation. Look at the throughput we got copying lot of data to the VM below leveraging vRSS. Yeah that’s 8.5Gbps inside of a VM. Sweet . I’m sure I can get to 10Gbps …
I’m playing and examining some of the ODX capabilities of our SANs (Dell, Compellent) at the moment. It all seems pretty impressive in the demo’s. But how does that behave in real live on our gear? How impressive is ODX? Well pretty darn impressive actually. And as all great power it needs to be wielded carefully, with insight and thought.
Let’s create some fixed virtual disks. 10 * 50GB vhdx and 10* 475GB vhdx. We run a simple quick PowerShell script:
You see this correctly, it’s 41.5088855 seconds. let’s round up to 42 seconds. That’s 20 fixed VHDX files. 10 of 50GB, 10 of 475GB in 42 seconds. That’s a total of 5.12TB of vhdx files.
Compared to creating a single 5TB vhdx file this isn’t to shabby as that get done in 26 seconds!
You can only dream of the kind of scenario’s this kind of power enables. Woooot!!!
When you work in an environment with multiple clusters and some of them are replaced, destroyed etc, you’ll end up with stale clusters in the “Recent Clusters” list of the Cluster Aware Updating GUI. In the example below the red entry (had to obfuscate, sorry) is a no longer existing cluster but it’s very similar to a new one that was created to fix a naming standard error. So we’d like to get rid of those to prevent mistakes and cluttering up the GUI with irrelevant information.
The Recent Cluster list is tied to your user profile and you can end up with a list polluted with stale entries of no longer existing clusters. To clean them out you can dive into the registry and navigate to:
So we set up a NIC team in switch independent mode with Dynamic load balancing, it’s identical as that one used for the tests with TCP/IP.
Compression basically slashes the live migration times in half at a cost. CPU cycles.And again with Dynamic load balancing we can now also use all member of a NIC team for live migration even in switch independent mode. The speeds for live migrating 6 VMs with 9GB of memory simultaneously were 12-14 seconds.
Take a look at the screen shot above. You see 6 VMs coming in to the host where these counters are collected and after that you see them being live migrated away from the host. As we have plenty of idle cycles I this test lab they get used, both when being the target and the source of the VMs being live migrated. You can also see that a lot less bandwidth is needed to achieve a faster live migration experience (compared to TCP/IP).
By the looks of it the extra bandwidth will help out when we have less CPU and vice versa. This is both the case for a single NIC or teamed NICs. Do note that you cannot combine compression with Multichannel. That means that the only scenario allowing for multiple NICs to be used with compression is NIC teaming. When you have a bunch of free 1Gbps NICs in surplus this might get things moving for you!
Interesting stuff. I’m really looking forward to the moment we can run production loads on these configurations …