In Windows 8 Beta there is a nice and functional improvement in Hyper-V Manager when you want to install or upgrade the Integration Services. It shows you what version (if any) is installed and if an upgrade is needed or not. Until now it just “mentioned” that “a previous” (no version, could be the latest one) were installed and happily let you reinstall them needed or not. Begs the questions how does this all deal with “corrupted” integration services if such a thing exists. I, personally, have never seen it. Uninstall/reinstall I guess when you come across it as I don’t know of a forced/repair install option.
Walkthrough of The Improved Integration Services Setup
In the Virtual Machine console navigate to Action and select “Insert Integration Services Setup Disk”
In the Virtual Machine console you’ll see that inserting the integration services disk succeeded.
Like before, if the setup process doesn’t start automatically just navigate to the DVD and kick start it yourself.
As you can see below it now shows what version (if any) of the integration services is already installed and asks you if you want to update. In the example below you can see it has the Windows 2008 R2 SP1 version of the integration services. This is as expected as this machine (a W2K3R2SP2 guest) was imported from a Hyper-V cluster running that Windows 2008 R2 SP1.
You click OK and the installation process for the integration services will start.
When the installation is done you’ll be notified that the virtual machines needs to restart.
The server will reboot and if you then try to install the integration services again it will notify you that it has already the correct version of the integration tools running.
Remarks
If you hit an error in the Beta of Windows 8 Hyper-V I advise two things I have experienced myself in the labs.
Make sure you have enough disk space. I had one test server that had only a few MB left on the C partition and that bit me
Make sure you do it after a clean reboot. Just to make sure you have no pending hardware detection/installs lingering around. I experienced this one on a Windows 2003 R2 SP2 guest. Error code 1618, yup that means Another installation is already in progress.
I’ve written before (see "Key Value Pair Exchange WMI Component Property GuestIntrinsicExchangeItems & Assumptions") on the need to & ways with PowerShell to determine the version of the integration services or integration components running in your guests. These need to be in sync with the one running on the hosts. Meaning that all the hosts in a cluster should be running the same version as well as the guests.
During an upgrade with a service pack this get the necessary attention and scripts (PowerShell) are written to check versions and create reports and normally you end up with a pretty consistent cluster. Over time virtual machines are imported, inherited from another cluster of created on a test/developer host and shipped to production. I know, I know, this isn’t something that should happen, but I don’t always have the luxury of working in a perfect world.
Enough said. This means you might end up with guests that are not running the most recent version of the integration tools. Apart from checking manually in the guest (which is tedious, see my blog "Upgrading a Hyper-V R2 Cluster to Windows 2008 R2 SP1" on how to do this) or running previously mentioned script you can also check the Hyper-V event log.
Another way to spot virtual machines that might not have the most recent version of the integration tools is via the Hyper-V logs. In Server Manager you drill down in the “Diagnostics” to, “Event Viewer” and than navigate your way through "Applications and Services Logs", "Microsoft", "Windows" until you hit “Hyper-V-Integration”
Take a closer look and you’ll see the warning about 2 guests having an older version of the integration tools installed.
As you can see it records a warning for every virtual machine whose integration services are older than the host running Hyper-V. This makes it easy to grab a list of guest needing some attention. The down side is that you need to check all hosts, not to bad for a small cluster but not very efficient on the larger ones.
So just remember this as another way to spot virtual machines that might not have the most recent version of the integration tools. It’s not a replacement for some cool PowerShell scripting or the BPA tools, but it is a handy quick way to check the version for all the guests on a host when you’re in a hurry.
It might be nice if integration services version management becomes easier in the future. Meaning a built-in way to report on the versions in the guests and an easier way to deploy these automatically if there not part of a service pack (this is the case when the guest OS and the host OS differ or when you can’t install the SP in the guest for some application compatibility reason). You can do this in bulk using SCVMM and of cause Scripting this with PowerShell comes to the rescue here again, especially when dealing with hundreds of virtual machines in multiple large clusters. Orchestration via System Center Orchestrator can also be used. Integration with WSUS would be another nice option, for those that don’t have Configuration Manager or Orchestrator but that’s not supported as far as I know for now.
We dive a bit deeper into SR-IOV today. I’m not a hardware of software network engineer but this is my perspective on what it is and why it’s valuable addition to the toolbox of Hyper-V in Windows 8.
What is SR-IOV?
SR-IOV stands for Single Root I/O Virtualization. The “Single Root” part means that the PCIe device can only be shared with one system. The Multi Root I/O Virtualization (MR-IOV) is a specification where it can be shared by multiple systems. This is beyond the scope of this blog but you can imagine this being used in future high density blade server topologies and such to share connectivity among systems.
What does SR-IOV do?
Basically SR-IOV allows a single PCIe device to emulate multiple instances of that physical PCIe device on the PCI bus. So it’s a sort of PCIe virtualization. SR-IOV achieves this by using NICs that support this (hardware dependent) by use physical functions (PFs) and virtual functions (VFs). The physical device (think of this a port on a NIC) is known as a Physical Function (PF) . The virtualized instances of that physical device (that port on our NIC that gets emulated x times) are the Virtual Functions (VF). A PF acts like a full blown PCIe device and is configurable, it acts and functions like a physical device. There is only one PF per port on a physical NIC. VF are only capable of data transfers in and out of devices and can’t be configured or act like real PCIe devices. However you can have many of them tied to one PF but they share the configuration of the PF.
It’s up to the hypervisor (software dependency) to assign one or more of these VFs to a virtual Machine (VM) directly. The guest can then use the VF NIC ports via VF driver (so there need to be VF drivers in the integration components) and traffic is send directly (via DMA) in and out of the guest to the physical NIC bypassing the virtual switch of the hyper visor completely. This reduces overhead on CPU load and increases performance of the host and as such also helps with network I/O to and from the guests, it’s as if the virtual machine uses the physical NIC in the host directly. The hyper visor needs to support SR-IOV because it needs to know what PFs and VFs are en how they work.
So SR-IOV depends on both hardware (NIC) and software (hypervisor) that supports it. It’s not just the NIC by the way, SR-IOV also needs a modern BIOS with virtualization support. Now most decent to high end server CPUs today support it, so that’s not an issue. Likewise for the NIC. A modern quality NIC targeted at the virtualization market supports this. And of cause SR-IOV also needs to be supported by the hypervisor. Until Windows 8, Hyper-V did not support SR-IOV but now it does.
I’ve read in an HP document that you can have 1 to 6 PFs per device (NIC port) and up to 256 “virtual devices” or VF per NIC today. But in reality that might not viable due to the overhead in hardware resources associated with this. So 64 or 32 VFs might be about the maximum but still, 64*2=128 virtual devices from a dual port 10Gbps NIC is already pretty impressive to me. I don’t know what they are for Hyper-V 3.0 but there will be limits to the number of SR-IOV NIC is a server and the number of VFs per core and host but I think they won’t matter to much for most of us in reality. And as technology advances we’ll only see these limits go up as the SR-IOV standard itself allows for more VFs.
So where does SR-IOV fit in when compared to VMQ?
Well it does away with some overhead that still remains with VMQ. VMQ took away the overload of a single core in the host have to be involved in handle all the incoming traffic. But still the hypervisor still has to touch every packet coming in and out. With SR-IOV that issue is addressed as it allows moving data in and out of a virtual machine to the physical NIC via Direct memory Access (DMA). So with this the CPU bottle neck is removed entirely from the process of moving data in and out of virtual machines. The virtual switch never touches it. To see a nice explanation of SR-IOV take a look at the Intel SR-IOV Explanation video on YouTube.
Intel SR-IOV Explanation
VMQ Coalescing tried to address some of the pain of the next bottle neck of using VMQ, which is the large number of interrupts needed to handle traffic if you have a lot of queues. But as we discussed already this functionality is highly under documented and it’s a bit of black art. Especially when NIC teaming and some NIC advanced software issues come in to play. Dynamic VMQ is supposed to take care of that black art and make it more reliable and easier.
Now in contrast to VMQ & RSS that don’t mix together in a Hyper-V environment you can combine SR-IOV with RSS, they work together.
Benefits Versus The Competition
One of the benefits That Hyper-V 3.0 in Windows 8 has over the competition is that you can live migrate to an node that’s not using SR-IOV. That’s quite impressive.
Potential Drawback Of Using SR-IOV
A draw back is that by bypassing the Extensible Virtual Switch you might lose some features and extensions. Whether this is very important to you depends on your environment and needs. It would take me to far for this blog post but CISCO seems to have enough aces up it’s sleeve to have an integrated management & configuration interface to manage both the networking done in the extensible virtual switch as the SR-IOV NICs. You can read more on this over here Cisco Virtual Networking: Extend Advanced Networking for Microsoft Hyper-V Environments. Basically they:
Extend enterprise-class networking functions to the hypervisor layer with Cisco Nexus 1000V Series Switches.
Extend physical network to the virtual machine with Cisco UCS VM-FEX.
Interesting times are indeed ahead. Only time will tell what many vendors have to offer in those areas & for what type customer profiles (needs/budgets).
A Possible Usage Scenario
You can send data traffic over SR-IOV if that suits your needs. But perhaps you’ll want to keep that data traffic flowing over the extensible Hyper-V virtual switch. But if you’re using iSCSI to the guest why not send that over the SR-IOV virtual function to reduce the load to the host? There is still a lot to learn and investigate on this subject As a little side note. How are the HBAs in Hyper-V 3.0 made available to the virtual machines? SR-IOV, but the PCIe device here is a Fibre HBA not a NIC. I don’t know any details but I think it’s similar.
To discuss Dynamic VMQ (DVMQ) we first need to talk about VMQ or VMDq in Intel speak. VMQ lets the physical NIC create unique virtual network queues for each virtual machine (VM) on the host. These are used to pass network packets directly from the hypervisor to the VM. This reduces a lot of overhead CPU core overhead on the host associated with network traffic as it spreads the load over multiple cores. The same sort of CPU overhead you might see with 10Gbps networking on a server that isn’t using RSS (see my previous blog post Know What Receive Side Scaling (RSS) Is For Better Decisions With Windows 8. Under high network traffic one core will hit 100% while the others are idle. This means you‘ll never get more than 3Gbps to 4Gbps of bandwidth out of your 10Gbps card as the CPU is the bottleneck.
VMQ leverages the NIC hardware for sorting, routing & packet filtering of the network packets from an external virtual machine network directly to virtual machines and enables you to use 8gbps or more of your 10Gbps bandwidth.
Now the number of queues isn’t unlimited and are allocated to virtual machines on a first-come, first-served basis. So don’t enable this for machines without heavy network traffic, you’ll just waste queues. It is advised to use it only on those virtual machines with heavy inbound traffic because VMQ is most effective at improving receive-side performance. So use your queues where they make a difference.
If you want to see what VMQ is all about take a look at this video by Intel.
Intel VMDq Explanation
The video gives you a nice animated explanation of the benefits. You can think of it as providing the same benefits to the host as RSS does. VMQ also prevents one core being overloaded with interrupts due to high network IO and as such becoming the bottle neck blocking performance. This is important as you might end up buying more servers to handle certain loads due to this. Sure with 1Gbps networking the modern CPUs can handle a very big load but with 10Gbps becoming ever more common this issue is a lot more acute again than it used to be. That’s why you see RSS being enabled by default in Windows 2008 R2.
VMQ Coalescing – The Good & The Bad
There is a little dark side to VMQ. You’ve successfully relieved the bottleneck on the host for network filtering and sorting but you know have a potential bottle neck where you need a CPU interrupt for every queue. The documentation states as follows:
The network adapter delivers interrupts to the Management Operating system for each VMQ on the processor based processor VMQ affinity. If the interrupts are spread across many processors, the number of interrupts delivered can grow substantially, until the overhead of interrupt handling can outweigh the benefit of using VMQ. To reduce the number of interrupts used, Microsoft has encouraged network adapter manufacturers to design for interrupt coalescing, also called shared interrupts. Using shared interrupts, the network adapter processes multiple queues with the same processor affinity in the same interrupt. This reduces the overall number of interrupts. At the time of this publication, all network adapters that support VMQ also support interrupt coalescing.
Now coalescing works but the configuration and the possible headaches it can give you are material for another blog post. It can be very tedious and you have to manage every action on your NIC and Virtual Switch configuration like a hawk or you’ll get registry values overwritten, value types changes and such. This leads to all kind of issues, ranging from VMQ coalescing not working to your cluster going down the drain (worse case). The management of VMQ coalescing seems rather tedious and a such error prone. This is not good. Combine that with the sometime problematic NIC teaming and you get a lot of possible and confusing permutations where things can go wrong. Use it when you can handle the complexity or it might bite you.
Dynamic VMQ For A Better Future
Now bring on Dynamic VMQ (DVMQ). All I know about this is from the Build sessions and I’ll revisit this once I get to test it for real with the beta or release candidate. I really hope this is better documented and doesn’t’ come associated with the issues we’ve had with VMQ Coalescing. It brings the promise of easy and trouble free VMQ where the load is evenly balanced among the cores and avoids to the burden of to many interrupts. A sort of auto scaling if you like that optimizes queue handling & interrupts.
That means it can replace VMQ Coalescing and DVMQ will deal with this potential bottleneck on its own. Due to the issues I’ve had with coalescing I’m looking forward to that one. Take note that you should be able to live migrate virtual machines from host with VMQ capabilities to a host that hasn’t. You do lose the performance benefit but I have no confirmation on this yet and as mentioned I’m waiting for the Beta bits to try it out. It’s like Live Migration between an SR-IOV enabled host and non SR-IOV enabled host, which is confirmed as possible. On that front Microsoft seems to be doing a real good job, better than the competition.