Windows Server 2016 RDMA and the Hyper-V vSwitch – Part I

Introduction

With Windows Server 2012 R2, using both RDMA and the Hyper-V vSwitch on the same host required separate physical network adapters (pNICs). There are 2 reasons for this.

  • First a vSwitch is generally created with a native Windows NIC team. Such a NIC team does not expose RDMA capabilities.
  • Second is that in Windows Server 2012 R2 you cannot expose RDMA capabilities via a vSwitch, even when you are using a non-teamed RDMA capable NIC.

As a result, the need for RDMA required more NICs on the Hyper-V hosts and/or a fully converged had some serious drawbacks. As servers have been quite capable and our VMs serve ever more intensive workloads this was not dramatic. Leveraging 2*10Gbps for a vSwitch and 2*10Gbps for redundant RDMA / SMB Direct traffic have long been one of my favorite designs. It leaves room for other traffic, such as backups, and it allows for high VM density. But with 40Gbps NICs that is overkill and a tad expensive in many scenarios, even when connecting to a SOFS share for Hyper-V storage, so 4*40Gbps on a Hyper-V host is not something I ever saw in real life.

Windows Server 2016 can expose RDMA capabilities via a vSwitch even without SET

What many people seem to have missed is that reason 2 has gone in Windows Server 2016 Hyper-V. Reason 1 still holds true. But that has been solved by Switch Embedded Teaming (SET). This means that you actually do not need SET to leverage RDMA with an vSwitch in Windows Server 2016 Hyper-V. You can do this as follows:


Below is what this looks like. We have one vNIC on the management OS leveraging RDMA/SMB Direct consuming all 10Gbps if the NIC we connected to the vSwith. This is a nice lab demo but you can see this isn’t perhaps the best idea in real life.

clip_image002

Other things to note

Do realize this still requires the pNIC to be RDMA capable. This is not some sort of soft RoCE or other software RDMA magic as of today. The pNIC also has to have RDMA enabled or virtual NIC won’t be able to leverage RDMA but fall back to SMB (Multichannel only) instead of SMB Direct. Likewise, RDMA has to be enabled on the vNIC as well. So don’t forget, RDMA must be enabled on both the pNIC and the vNIC for this to work.clip_image004

DCB’s PFC/ETS requires a tagged VLAN to carry the priority, do don’t forget to tag the vNIC. There is actually no need to tag the pNIC as long as the switch port has the tagged VLAN set – most likely as a trunk or in general mode. If you don’t tag consistently across the entire network stack you’ll have network issues anyway and RDMA performance will be bad if it works at all.

Finally, don’t forget this is example is not using VMM /Network Controller and as such is using Set-VMNetworkAdapterVLAN and not Set-VMNetworkAdapterIsolation.

In real life, we need better and more than a single NIC vSwitch

The caveat here is that, while you have a converged setup, you have no redundancy for the vSwitch (there is no team). This also means that you’re are limited to a single NIC in regards to throughput for that vSwith. Depending on the needs of the solutions that might be perfectly fine. It it’s not – in most real-world scenarios you’ll need redundancy – you have to use SET in a converged scenario. That’s what we’ll take a look at in part 2. Then there is the question about QoS as you don’t want SMB Direct traffic to consume to much bandwidth at will. That’s still another issue to discuss and address.

Unable to correctly configure Time Service on non PDC Domain Controller

Introduction

Around new year, between the 31st 2016 and the 1st of January 2017 some ISP had issues with the time service. It jumped 24 hours ahead. This cause all kinds of on line services issues ranging from non working digital TV to problems with the time service within companies. That caused some intervention time and temporarily switching the external reliable NTP time server sources to another provider that didn’t show this behavior. Some services required a server reboot to sort things out but things were operational again. But it became clear we still had a lingering issue afterwards as we were unable to correctly configure Time Service on non PDC Domain Controller.

Unable to correctly configure Time Service on non PDC Domain Controller

A few days later we still had one domain, which happend to be 100% virtualized, with issues. As turned out the second domain controller, which did not hold the PDC role, wasn’t syncing with the PDC. No matter what we tried to get it to do so. If you want to find out how to do this properly for virtualized environment I refer you to a blog post by Ben Armstrong Time Synchronization in Hyper-V and fellow MVP Kevin Green Hyper V Time Synchronization on a Windows Based Network.

But no matter what I did, the DC kept  getting the wrong date. I could configure it to refer to the PDC as much as I wanted, nothing helped. It also kept saying the source for the time was the local CMOS (w32tm /query /source). I kept getting an error, we’re normally able to fix by configuring the time service correctly.

image

Another trouble shooting path

The IT universe was not aligned to let me succeed. So that’s when you quit … for a coffee break. You relax a bit, look out of the window whilst sipping from your coffee. After that you dive back in.

I dove into the registry settings for the Windows time service in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time of a functional DC in my lab and the one of the problematic DC in the production domain.  I started comparing the settings and it all seemed to be in order. But for one serious issue with the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Security key on the problematic DC.

Trying to open that key greeted me with the following error:

Error Opening key

Security cannot be opened. An error is preventing this key from being opened. Details: The system cannot find the file specified.

image

That key was empty. Not good!

image

I exported the entire W32Time registry key and the Security key as a backup for good measure on the problematic DC and I grabbed an export of the security key from the working DC (any functional domain joined server will do) and imported that into the problematic DC. The next step was to restart the time service but that wasn’t enough or I was to inpatient. So finally I restarted the DC and after 10 minutes I got the result I needed …

image

Problem solved Smile

The Hyper-V Processor Relative Weight

Introduction

Hyper-V offers 3 ways of managing or tweaking the CPU scheduler to provide the best possible configuration for certain scenarios and use cases. The defaults normally work fine but of certain conditions you might want to tweak them for the best possible outcome.  The CPU resource controls at your disposal are:

  • Virtual machine reserve  – Think of this as the minimum CPU “QoS”
  • Virtual machine limit – Think of this as the maximum CPU “QoS”
  • Relative Weight – Think of this as the scale defining what VM is more important.

Note that you should understand what these setting are and can do. Threat them like spices. Select the ones you need and don’t overdo it. They’re there to help you, if needed you can leverage all three. But it’s highly unlikely you’ll need to do so. Using one or two will server you best if and when you need them.

In this blog post we’ll look at the relative weight.

Relative weight

Relative weight is a relative number between 1 and 10000 that you can assign to a virtual machine. This determines the relative importance of a virtual machines CPU resources in regards to other virtual machines. So it’s not a % or number of cycles, it’s just a arbitrary weight. By default this is set to 100.

image

You need to come up with a scale and stick to it. 100, 200 and 300 for low, medium and high important virtual machines is a good example. You could also create 10 “classes”  1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 5500. This leaves room to even create even more (lower, in between and higher).

Note that as long as there are sufficient CPU resources on a host the relative weight does not come into play. It really doesn’t matter whether a virtual machine has relative weight of 1000 versus 5000 at that time. They both get whatever they need as there’s plenty to go around.

Relative weight kicks in when the demand is higher than the availability on a physical host. When you have left all the virtual machines at the default of 100 they will all get an equal share. But when you’ve set virtual machines with a higher relative weight these will be getting a higher share of the available CPU cycles.

Use Cases

Not all virtual machines are created equal. In reality some workloads are more important than others. This might be development and test versus production or high priority workloads versus lower priority workloads. The lower priority workloads are the once that you care about less when there is contention for CPY cycles. Or workloads where less CPU cycles and slower response times don’t make a real difference.

Another use case might be on your developers or lab host where you have a CPU sensitive workload you give a much higher weight and leave the others at the default of 100.

To make sure the high priority workloads or those that really depend on more CPU cycles being delivered fast don’t have to play second fiddle to those that don’t have those needs we use relative weight. It’s very flexible and only kicks in when needed, so there is no waste or inefficiency there.

Limitations

The biggest limitation is in the name. It’s all relative. Where as reserve or limit give you a minimum and a maximum respectively, the relative weight only defines what virtual machine more important than another in regards to CPU cycles. So some virtual machines get more than others but that might not be enough. It’s all about balance between virtual machines, not guaranteed minima or maxima.

You need to agree on a standard within the company to define weight. If everyone starts using a different scale you’re in trouble.

Let’s take one admin who uses 100 for less important virtual machines, 200 for standard virtual machines and 300 for the most important ones. That’s all great when he’s the only one defining the settings and when he does so consistently on all nodes/ cluster for all VMs. In that case all is well even when VMs move around between hosts or between clusters. But what happens when many admins use different “scales”. Well it’s a mess and the behavior won’t be what you want when your colleague used 1000, 2000 and 3000 respectively for the same definition. It’s also smart to not use 100, 101 and 102. leave some margin for adding a category when needed.

Conclusion

This is one handy tool to have at your disposal and I tend to use it to proactively set a higher weight for very important VMs. Even in an environment where there are no predefined categories or know minima this allows me to tell Hyper-V that, if there ever is contention for CPU cycles, the virtual machines with a higher weight are the one to serve a bigger share of the limited resources.

The Hyper-V Processor Virtual Machine Limit

Introduction

Hyper-V offers 3 ways of managing or tweaking the CPU scheduler to provide the best possible configuration for certain scenarios and use cases. The defaults normally work fine but of certain conditions you might want to tweak them for the best possible outcome.  The CPU resource controls at your disposal are:

  • Virtual machine reserve  – Think of this as the minimum CPU “QoS”
  • Virtual machine limit – Think of this as the maximum CPU “QoS”
  • Relative Weight – Think of this as the scale defining what VM is more important.

Note that you should understand what these setting are and can do. Threat them like spices. Select the ones you need and don’t overdo it. They’re there to help you, if needed you can leverage all three. But it’s highly unlikely you’ll need to do so. Using one or two will server you best if and when you need them.

In this blog post we’ll look at the virtual machine limit.

Virtual Machine Limit

The virtual machine limit is a setting on the vCPU configuration of a virtual machine that you can set to limit the % of CPU resources a virtual machine can grab from the host. This setting limits the vCPU, preventing it to use more than the defined maximum percentage. The default percentage is 100.

Let’s look at some examples below based on a simple 4 core host with a VM.

On a 1 vCPU virtual machine with a virtual machine limit of 100% this means it can grab the equivalent of compute time slices of maximum 1 CPU on the host. Which is 25% of the total system CPU resources

image

So in case of a home lab PC that has 4 cores you can see that setting the virtual machine limit to 100% means it’s limited to 100% of the total system CPU resources.

image

When I set the number of vCPUs to 2 this drops to 50% of the total system CPU resources.

image

Now by playing with the virtual machine limit you can configure the desired maximum of the total CPU system resources a VM can grab. A 2 vCPU VM with 60% as a virtual machine limit get 30% of the total system CPU resources at the most on 4 CPU host.

image

Use cases

When you’re worried about run away virtual machines due to either misbehaving applications or developers who tend to run CPU stress tools in the guest and you really want to cap them to a certain limit without reducing the number of vCPUs (which they might need for testing parallelism, NUMA awareness) this might be a tool to use.

Limitations

This setting is always enforced. Even if a host has 32 cores and is only using 40% of them, a virtual machine could consume more CPU cycles without causing issues it won’t be able to. It’s a hard cap that’s always enforced.

The setting is enforced per vCPU. This means that when a single threaded app in a 4 vCPU virtual machine consumes the virtual machine limit it is capped even when the 3 other vCPU are totally idle.  So it’s not 30% of all vCPUs is 30% per vCPU maximum.

Conclusion

It’s a tool you have at your disposal but it’s probably the least used one. It has limited uses case due to it’s limitations. For most scenarios you’re better of leveraging virtual machine reserve or the relative weight. These are more flexible and are only enforced when needed, providing a smarter use of resource.