Do you need hard processor affinity in Hyper-V?

Do you need hard processor affinity in Hyper-V? Good question but let’s set the context first. I tend to virtualize workloads that shock some people. Not because they are super huge solutions requiring Petabytes of storage, 48TB of RAM, 256 cores and a million IOPS. Far from. The shock often comes from people who still consider virtualization as something for the lightweight infra services like DHCP, DNS, WSUS, print servers, or web services and websites. Some of these people even tried to virtualize other services like SharePoint ,SQL, Exchange etc. but they did not take into a account that virtualization is not magic and you’ll need to provision adequate resources and design /manage your environment to do so successfully.  So part of them got bitten. They conclude that performance requires physical deployments … and they want to see a material CPU so to speak.

CPU_closeup

When they see virtual machines with 12 tot 16 vCPUs or > 100GB of memory they seem to thinks that even those workloads are bad candidates to virtualize, let alone even bigger ones. That’s not true by definition. As long as you make sure that you know why (cost/benefits/risks) and how to virtualize it can work. You must provision and allocate the required resources. You must also have the right expertise in both virtualization (servers, storage, networking) and the applications involved (SQL, Exchange, 3rd party  products, …)  along with good operational processes.

You can really virtualize a lot when done right. My “virtual first” approach is rule of thumb and exceptions do exist even when I’m calling the shots. However just like people quoting costs, latency, security, lock-in to question the suitability of Public Cloud versus on premises in “subjective” ways, they do so when it comes to virtualization as well. The discussion if often more about organizational issues, control, fear, politics, interests and money. Every hosting provider out there loves virtualization as it’s great for their TCO/ROI. But when it comes to Public Cloud they’re often less convinced. That “datacenter zero” concept isn’t that attractive to them so we see Hybrid and Public Cloud offerings that might not be that good of an idea in some cases but it fits their interests more. Have you noticed that there are no highly automated, optimized data centers anymore, only * clouds? There are valid use cases for hybrid and private clouds but just like with virtualization, maybe we should let go of the personal/business interests, the fear, and false assumptions when advising customers. It all depends.

In this regards I had several discussions now with people about the lack of hard processor affinity with Hyper-V. This makes it unfit for high performance workloads in their opinion. Sure, such cases do exist. These are however, not the majority. As I’ve  been having this discussion rather often in the past months I wrote an article on the subject that I’ve published in collaboration with StarWind Software Need Hard Processor affinity for Hyper-V? The idea is to reach more people and share insights with the community. Full disclosure: I happen to know Anton Kolomyeytsev (CEO, CTO and Chief Architect at StarWind)  professionally as a fellow MVP and I have great respect for his technical expertise, insights and experience. This made me agree to publish some content via their blog. Sharing opinions and ideas with as many people as possible only makes for better technologist everywhere.

High Availability has a price

We’ll go back to basics today. Some times the obvious, no matter how evident it is to us technologists, is challenged. Recently we got the remark that we were wasting CPU cycles by assigning to many vCPU to certain virtual machines on our Hyper-V cluster. So we had to explain that high availability has a price. On top of that we had to explain that things are not as wasteful as they seem in a virtual environment.

The case

Here’s one of the “offending” virtual machines. They assumed that we are wasting at least 50% of 12 CPUs.

image

This is one node in a dual node  load balancing (active-active) and highly available solution. This provides for zero down time during scheduled maintenance and very little downtime during system failures.

And here’s the second node (yes the 1st node has been down for scheduled maintenance more recently that node 2).

image

In a 2 node HA solution you need to make sure that one node can handle the entire workload. This is the absolute border line of an N+1 solution.  This means you can lose 1 node. N determines the number of nodes needed to guarantee an agreed upon service level and the number defines how many nodes failures can be tolerated before affecting the service.

In the above example there’s a need to have the CPU resources on each node to run the entire workload on one node without having an effect on the service. Therefore, when both nodes are up this might seem like a waste to the uninitiated. It is however a required to achieve the high availability goal. A constant CPU usage over 75 % will lead to a reduction in service quality in this case and even compromise the usability of the that service.

I did not even dive into the dangers of designing purely based on averages during this “explanation”. That was one step to much for the level of the discussion.

It’s also important to note that Hyper-V CPU scheduling is highly intelligent and is far less susceptible to the waste of CPU cycles via over provisioning of vCPU than some other solutions are or used to be. Knowing the capabilities and inner working of the technology used is also important in all this. More nodes generally also make “over provisioning” less of an issue. When you have 10 nodes and you lose 1, you only have lost 10% of the capabilities, not 33% like in a 3 node cluster.

Ideally you have 3 node so that even during an issue with one node you still maintain redundancy. However if you want acceptable services during a 2 node failure you’ll need to go to N+2, meaning that you need 2 nodes to provide the services and handle losing 2 nodes gracefully. In that case you’ll need 4 node and so on.  The larger the node count  the wiser it is to go to a N+2 model and ideally you’ll provide separate failure domains over which the nodes are distributed. An example of this is having a redundant geo-load balanced web farm of 32 virtual machine nodes spread over 2 locations and running on separate hardware failover clusters in each location. As you can see the higher the stakes and demands the faster the cost and potential complexity rises. You can offload some of the complexity by leveraging a public cloud like Azure, but the costs will still be there. There is no such thing as a free lunch, some are quite easy and affordable for what you get.

Conclusion

High Availability has a price. I did mention that already, right? To be able to keep your services running at a level that is both workable and acceptable to your customers and stake holders you will need to over provision to a degree. There is no magic here. When your solutions are being scrutinized by people with no real background, experience and context in high availability you might need to explain this.

Get-VMHostSupportedVersion Windows Server 2016 Hyper-V

Windows Server 2016 Technical Preview 4 has this great PowerShell command to check what virtual machine versions the Hyper-V host supports: Get-VMHostSupportedVersion

When you run this the output on Windows Server 2016 TPv4 is as below.

image

As you can see it supports 5.0 (which translates to the version on Windows Server 2012 R2). That’s logical, it’s the version your migrated VMs will have when you move them to Windows Server 2016 as the VM version is no longer automatically updated. You can read more about this here: https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/user_guide/migrating_vms

You must also realize that updating from 5.0 to higher means moving to the newer Virtual machine configuration file format, which is way better for performance and is a binary format and not longer XML.

6.2 is the version a VM had in the Technical Preview 3 era. Which is cool as when you move your VMS between TPv3 and TPV4 they still worked and were supported. Now in reality you want to be either at 5.0 so you can still move VMs back to a W2K12R2 host or at 7.0 to have all new features and capabilities that come with W2K16 at your disposal.

7.0 is also the current default and are the version any newly created VMs or any VM you update via Update-VMVersion will be at.

Now don’t panic, if you need to stay at version 5.0 even for newly created VMs for backward compatibility in mixed Hyper-V Windows Server 2012 R2 / Windows Server 2016 environment. You can specify this with the –version parameter of New-VM

New-VM -Name “IChooseMyVersion-5.0” -Version 5.0

To know what version you can pass to that parameter you only have to look at the output of Get-VMHostSupportedVersion: 5.0, 6.2, 7.0 at the time of writing (on TPv4).

You can see the result of some fun with this in the below screenshots:

image

image

I fully expect Microsoft to keep this up for TPv5 or perhaps even with RTM to allow a smooth transition for VMs running on preview editions to RTM. We’ll see. At RTM time I can see them removing the ability to create a new VM that’s still at 6.2 or so as It’s a “intermediate” work in progress version or so on the way to the final default version number at RTM (see  Windows Server 2016 TPv4 Hyper-V brings virtual machine configuration version 7). We’ll see!

Storage-level corruption guard

One of the many gems in Veeam Backup & Replication v9 is the introduction of storage-level corruption guard for primary backup jobs. This was already a feature for backup copy jobs. But now we have the option of periodically scanning or backup files for storage issues.It works like this: if any corrupt data blocks are found the correct ones are retrieved from the primary storage and auto healed. Ever bigger disks, vast amounts of storage and huge amounts of data mean more chances of bit rot. It’s an industry wide issue. Microsoft tries to address this with ReFS and storage space for example where you also see an auto healing mechanism based on retrieving the needed data from the redundant copies.

We find this option on the maintenance tab of the advanced setting for the storage settings of a backup job, where you can enable it and set a schedule.

image

The idea behind this is that this is more efficient than doing periodical active full backups to protect against data corruption. You can reduce them in frequency or, perhaps better, get rid of those altogether.

Veeam describes Storage-level corruption guard as follows:

image

Can it replace any form of full backup completely? I don’t think so. The optimal use case seems to lie in the combination of storage-level corruption guard with periodic synthetic backups. Here’s why. When the bit rot is in older data that can no longer be found in the production storage, it could fail at doing something about it, as the correct data is no longer to be found there. So we’ll have to weigh the frequency of these corruption guard scans to determine what reduction if making full backups is wise for our environment and needs. The most interesting scenario to deal with this seems to be the one where we indeed can eliminate periodic full backups all together. To mitigate the potential issue of not being able to recover, which we described above, we’d still create synthetic full backups periodically in combination with the Storage-level corruption guard option enabled. Doing this gives us the following benefits:

  • We protect our backup against corruption, bit rot etc.
  • We avoid making periodic full backups which are the most expensive in storage space, I/O and time.
  • We avoid having no useful backup let in the scenario where Storage-level corruption guard needs to retrieve data from the primary storage that is no longer there.

To me this seems to be a very interesting scenario. To optimize backup times and economies. In the end it’s all about weighing risks versus cost and effort. Storage-level corruption guard gives us yet another tool to strike a better balance between those two. I have enabled it on a number of the jobs to see how it does in real life. So far things have been working out well.