Windows Server 2016 RDMA and the Hyper-V vSwitch – Part II

Introduction

In part I this article I demonstrated that some of the rules in regards to SMB Direct and the Hyper-V vSwitch as we know them for Windows Server 2012 R2 have changed with Windows Server 2016. We focused on the fact that you can expose RDMA to a vNIC exposed to the management OS created on a vSwitch. This means that while in Windows Server 2012 R2 you cannot expose RDMA capabilities via a vSwitch, even when you are using a non-teamed RDMA capable NIC, this is no longer true with Windows Server 2016.

While a demo with a vSwitch on a single NIC as we did in part I is nice it’s unlikely you’ll use this often if at all in the real world? Here we require redundancy and that means NIC teaming. To do so we normally use a vSwitch created on a native Windows NIC team. But a native NIC teaming does not expose RDMA capabilities. And as such a vSwitch created against a Windows native NIC team cannot leverage RDMA either. Which was the one of the reasons why a fully converged scenario in Windows Server 2012 R2 was too limited for many scenarios. Loss of RSS on the vNIC exposes to the management OS was another. The solution to this in Windows Server 2016 Hyper-V comes with Switch Embedded Teaming (SET). Now using SET in each and every situation might not be a good idea. It depends. But we do need to know how to configure it. So let’s dive in.

Switch Embedded Teaming (SET) exposes RDMA to the vSwitch

Switch Embedded Teaming (SET in Windows Server 2016 allows multiple identical (make, model, firmware, drivers to be supported) NICs to be used or “teamed” within the vSwitch itself. The important thing to note here this does not use windows NIC teaming or LBFO (Load Balancing and Fail Over).

SET is the future and is needed or use with the Network Controller and Software Defined Networking in Windows. SET can also be used without these technologies. While today it supports a good deal of the capabilities of native Windows NIC teaming it also lacks some of them. In general SET is meant for full or partial converged scenarios with 10GBps or better NICs, not 1Gbps networking in a (hyper)converged Hyper-V scenario.

Please see New Windows Server 2016 NIC and Switch Embedded Teaming User Guide for Download for more information as there is just too much to tell about it.

Setting it up

We start out with a 2-node cluster where each node has 2 RDMA NICs (Mellanox ConnectX-3) with RDMA enabled and DCB configured. Live migration of VMs between those nodes works over SMB Direct works. All NIC are on the same subnet 172.16.0.0/16 (thanks to Window Server 2016 Same Subnet Multichannel) and are on VLAN 110. In Failover Cluster Manager (FCM) that looks like below.

clip_image002

We’ll now use the rNICs to create a Switch Embedded Team.

clip_image004

Note that the teaming mode is switch independent, the only option supported with SET in Window Server 2016.

clip_image006

This also gives us a vNIC exposed to the management OS (default)

clip_image008

This is also visible as a vNIC in the mamagement OS called “vEthernet (RDMA-SET-vSwitch)”

clip_image010

This will be used to manage the host and to make its purpose clear we’ll rename it.

We’ll create 2 separate management OS vNICs for the RDMA traffic later. For now, we want the HOST-MGNT vNIC to have connectivity to the LAN and for that we need to tag it with VLAN 10.

image

The vNIC actually “inherited” the IP configuration of one of our physical NICs and we need to change that to either DHCP or a correct LAN IP address and settings.

clip_image014

You can use the code below to set the HOST-MGNT vNIC to DHCP

To finalize the HOST-MGNT vNIC configuration we enable priority tagging on it. If we don’t we won’t see any traffic other than SMB Direct tagged at all!

clip_image016

Before we go any further we’ll remove the VLAN tag from the rNICS as we don’t want it interfering with egress traffic being tagged by them or ingress traffic being filtered because it doesn’t match the VLAN ID on the rNICs.

From here on we’ll focus on the RDMA capable vNICs well create and use for SMB traffic.

We create 2 vNIC on the management OS for SMB Direct traffic.

clip_image018

Now these vNIC need an IP address, this can be in the same subnet because we have Windows Server 2016 SMB multichannel.

We than also need to put the vNICs in the correct VLAN. Remember that DCB / PFC priority tagging needs tagged VLAN so carry that priority. Right now, we can see that these are untagged.

clip_image020

So we tag them with VLAN ID 110

clip_image022

We enable jumbo frames on the vNICs. Remember that the physical NICs in the SET have jumbo frames enabled as well.

clip_image024

Normally all traffic that is originated from vNICs gets any QOS values set to zero. There is one exception to this and that’s SMB Direct traffic. SMB Direct traffic gets tagged with its QoS priority and that is not reset to 0 as it bypasses the vSwitch completely. But if we set other priorities on other types of traffic for DCB PFC and or ETS that passes over these vNICs we must enable priority tagging on these NICs as well or they’ll be stripped away.

clip_image026

The association of the vNIC to pNICs is random. This also changes during creation and destruction (disabling NICs, restarting the OS). We can map a vNCI to a particular pNIC. This prevents suboptimal use of the available pNICs and provides for a well know predictable path of the traffic. We do this with the below PowerShell commands.

Finally, last but not least, we should enable RDMA on our two vNICs or SMB Direct will not kick in at all.

Right now, we have it all configured correctly on one node of our 2-node cluster. The SMB network look like this now:

clip_image028

The cluster now looks like below.

clip_image030

We can live migrate VMs over SMB Direct in this mixed scenario where one node has pNICs RDMA NICs, 1 node has SET with vNICs for RMDA.

clip_image032

When looking at this in report mode we clearly see Node-A send SMB Direct traffic (tagged with priority 4, green) over its RDMA enabled SET vNICs to Node-B which still has a complete physical rNIC set up (blue).

clip_image034

As you can see in the screen shots above we now have RDMA / SMB Direct working with SET / RDMA vNICs on one node (Node-A) and pure physical RDMA NICs on the other (Node-B). This gives us bandwidth aggregation and redundancy. To complete the exercise, we configure SET on the other node as well. But it’s clear SET and RDAM will also work in a mixed environment.

We’ll discuss some details about certain aspects of the vNIC configuration in future articles. Things like the why and how of Set-VMNetworkAdapterTeamMapping and the use of -IeeePriorityTag. But for now, this is it. Go try it out! It’s the basis for anything you’ll do with SDNv2 in W2K16 and beyond.

An error occurred connecting to the cluster

An error occurred connecting to the cluster

This morning I woke up to a bunch of failed backup notifications of our trusted Veeam Backup & Replication v9.5 update 2 solution. After 3:30 AM the backups of one particular cluster started failing.

I went to have a look but I could not connect to the 3 node cluster.

image

I logged on to the cluster nodes themselves and did a quick verification of network connectivity, DNS etc. That was all fine. WMI services were running on all nodes but on node 2 and 3 they were not functional.

Cleary we have a WMI issue. And sure enough, no Hyper-V manager available on those 2 nodes but we did have it on the one properly functioning node.

We tested some PowerShell WMI queries (get-wmiobject mscluster_resourcegroup -computer NodeToTest -namespace “ROOT\MSCluster“) to the cluster and this confirmed that WMI was toast on those two nodes.

Fixing the issue

The good news was that all the VMs were all up and running  – a few that had RHS.exe issues – but were still alive pure Hyper-V wise. That explains why they didn’t have any support calls come in. So if we can fix this without causing down time this would be great. To try this we decided to restart the WMI service.

On problematic node 2 this worked. It restarted depending services as well such as Hyper-V Virtual Machine Management, User Access Logging Service, IP Helper and the Veeam Installer Service and the Veeam Hyper-V Integration Service. We got connectivity back via Hyper-V manager but the Failover Cluster manager GUI remained an issue but now only complained about node 3.

image

We wanted to avoid rebooting node 3 to avoid downtime to the VMs. So what we did there is stop the depending services that we could stop. It was vmms.exe that was stuck in shutdown we just killed the process manually with stop-Process -name “vmms” -force
That allowed the WMI service to be restarted. We then started the depending services manually and we got back the connectivity to Hyper-V Manager on node 3.

The Failover Cluster manager GUI could also connect again to the cluster. We checked the cluster for other issues. When done and found OK we live migrated the VMs node per node and did a reboot of every node one by one. This to have cleanly started nodes and to see if any trouble some event were logged during the startup. Normal operations were resumed.

Do note that there is a blog on TechNet about a similar issue but with a different error message. That was caused by missing cluswmi.mof file due to an ill advised use of run mofcomp.exe *.mof. This was not the case here. A reboot of the misbehaving nodes would have done the trick as well (as blogged here Trouble Connecting to Cluster Nodes? Check WMI! ) but we avoided as much downtime as possible here by going the route we did.

Missing Hyper-V Service Connection Point caused failed off-host backup proxy jobs

The issue

We have a largish Windows Server 2016 Hyper-V cluster (9 nodes) that is running a smooth as can be but for one issue. The off-host backups with Veeam Backup & Replication v9.5 (based on transportable hardware snapshots) are failing. They only fail for the LUNs that are currently residing on a few of the nodes on that cluster. So when a CSV is owned by node 1 it will work, when it owned by node 6 it will fail. In this case we had 3 node that had issues.

As said, everything else on these nodes, cluster wise or Hyper-V wise was working 100% perfectly. As a matter of fact, they were the perfect Hyper-V clusters we’d all sign for. Bar that one very annoying issue.

Finding the cause

When looking at the application log on the off-host backup proxy it’s quite clear that there is an issue with the hardware VVS provider snapshots.

We get event id 0 stating the snapshot is already mounted to different server.

clip_image002

Followed by event id 12293 stating the import of the snapshot has failed

clip_image004

When we check the SAN, and monitor a problematic host in the cluster we see that the snapshot was taken just fine. what was failing was the transport to the backup repository server. It also seemed like an attempt was made to mount the snapshot on the Hyper-V host itself, which also failed.

What was causing this? We dove into the Hyper-V and cluster logs and found nothing that could help us explain the above. We did find the old very cryptic and almost undocumented error:

Event ID 12660 — Storage Initialization

Updated: April 7, 2009

Applies To: Windows Server 2008

This is preliminary documentation and subject to change.

clip_image005

This aspect refers events relevant to the storage of the virtual machine that are caused by storage configuration.

Event Details

Product:

Windows Operating System

ID:

12660

Source:

Microsoft-Windows-Hyper-V-VMMS

Version:

6.0

Symbolic Name:

MSVM_VDEV_OPEN_STOR_VSP_FAILED

Message:

Cannot open handle to Hyper-V storage provider.

Resolve

Reinstall Hyper-V

A possible security compromise has been created. Completely reimage the server (sometimes called a bare metal restoration), install a new operating system, and enable the Hyper-V role.

Verify

The virtual machine with the storage attached is able to launch successfully.

This doesn’t sound good, does it? Now you can web search this one and find very little information or people having serious issues with normal Hyper-V functions like starting a VM etc. Really bad stuff. But we could start, stop, restart, live migrate, storage live migrate, create checkpoints etc. at will without any issues or even so much as a hint of issues in the logs.

On top of this event id Event ID 12660 did not occur during the backups. It happens when you opened up Hyper-V manager and looked at the setting of Hyper-V or a virtual machine. Everything else on these nodes, cluster wise or Hyper-V wise was working 100% perfectly Again, this is the perfectly behaving Hyper-V cluster we’d all sign for. If it didn’t have that very annoying issue with a transportable snapshot on some of the nodes.

We extended our search outside if of the Hyper-V cluster nodes and then we hit clue. On the nodes that owns the LUN that was being backup and that did show the problematic transportable backup behavior noticed that the Hyper-V Service Connection Point (SCP) was missing.

clip_image006

We immediately checked the other nodes in the cluster having a backup issue. BINGO! That was the one and only common factor. The missing Hyper-V SCP.

Fixing the issue

Now you can create one manually but that leaves you with missing security settings and you can’t set those manually. The Hyper-V SCP is created and attributes populates on the fly when the server boots. So, it’s normal not to see one when a server is shut down.

The fastest way to solve the issue was to evacuate the problematic hosts, evict them from the cluster and remove them from the domain. For good measure, we reset the computer account in AD for those hosts and if you want you can even remove the Hyper-V role. We then rejoined those node to the domain. If you removed the Hyper-V role, you now reinstall it. That already showed the SCP issue to be fixed in AD. We then added the hosts back to the cluster and they have been running smoothly ever since. The Event ID 12660 entries are gone as are the VSS errors. It’s a perfect Hyper-V cluster now.

Root Cause?

We’re think that somewhere during the life cycle of the hosts the servers have been renamed while still joined to the domain and with the Hyper-V role installed. This might have caused the issue. During a Cluster Operating System Rolling Upgrade, with an in-place upgrade, we also sometime see the need to remove and re-add the Hyper-V role. That might also have caused the issue. We are not 100% certain, but that’s the working theory and a point of attention for future operations.

Replay Manager 7.8 and cluster OS rolling upgrade Tips

Compellent Replay manager 7.8  Windows Server 2016 Clusters in mixed mode or at cluster functional lever 8

Consider this a a quick publish about tips for when you combine Replay Manager 7.8, Compellent and Windows Server 2016. Many of you will be doing cluster operating system rolling upgrade of your Windows Server 2012 R2 clusters to Windows Server 2016. If you have done your homework and made sure your hardware is supported you can still run into a surprise. As long as your in mixed mode (Wi2K12R2 mixed with W2K16 nodes) or have not updated the cluster functional level to 9 (Windows Server 2016) you will have a few issues.

In Replay Manager 7.8  itself you’ll notice that the nodes of your cluster only see the CSV LUNs under local volumes that they are the owner of currently. Normally you’ll see all of the CSV LUNs of the (Hyper-V) cluster on all of the nodes of that cluster. So that’s not the expected behavior. This leads to failed  restore points when you run a snapshot from a host that is not the owner of the CSV etc.

image

On top of that when you try to run a backup job it will fail. The reason given is:

The requested volumes is not supported because it is not managed by the provider, is a dynamic volume, or it has some other incompatibility with the current operation.

The fix? Just update your upgrade cluster to cluster functional level  (level 9)

It’s as easy as that. The moment you upgrade your cluster functional level to 9 you will see all the CSV on the cluster on every node of that cluster you connect to. At that moment the replays will also work. That’s OK, you want to move swiftly trough the rolling upgrade and once you’re comfortable all drivers and firmware are working fine. You do not want to be in a the lower cluster version too long, but upgrade to benefit from the new capabilities in Windows Server 2016 Failover clustering. You do need to know this when you start your upgrades

image

Close your backups apps, restart the Replay manager service on the cluster nodes, refresh / reconnect to the backup apps, and voila. You’ll see the image you are use to in Replay Manager 7.8 (green text / arrows) and the backup jobs will work as well as any other backup product using the Compellent Replay Manager 7.8 hardware VSS provider.image

I hope this helps some of you out there. So yes Replay Manager 7.8 supports Windows Server 2016 Clusters with CSV LUNs but if you upgraded your cluster via cluster operating system rolling upgrade you need to have upgraded your cluster functional level! Until then, Replay Manager 7.8 isn’t going to work very well.

So there you go, that’s another reason to move through that process fast and smooth as you can.

Still missing in action for Hyper-V with Replay Manager 7.8

I’d really like for Replay Manager to be a bit more cluster friendly. No matter what node you are connected to they show you all CSV LUNs in the cluster. Since Replay manager 7.8 with Windows Server 2016 when you run a job manually you must start it when connected to the cluster node that owns the CSV or the job will fail with “No resources found on current cluster node for backup set”.

image

This was not the case with Windows Server 2012(R2) and earlier versions of Replay Manager. That did throw some benign errors in the event logs on the cluster node but it did work. I would love for DELLEMC to make sure the Replay Manager Client is smart enough to detect who owns the CSV and make sure it’ starts the job from that node. That would be a lot more user friendly. At the very least it should indicate which of the CSV LUNs you see are owned by the cluster node you are connected to.But when launching a backup job for a CSV that’s not owned by the node you are connect to the job quits/fails. They can detect the node they need, launch the job on that node and show it to you. That avoids having to go find out yourself what cluster node to connect to in Replay manager when you need to run a out of schedule job manually? The tech/logic is already there as the scheduled jobs get launched on the correct node.

It would also be great if they finally could get the logic built into Replay manager for the Hyper-V VM backups to know on what CSV and Hyper-V node the VM lives and deal with that. Sure it might cause more more snapshots to be made but that’s an invalid argument. When the VMs are on the same node,but different  CSV’s that’s already happening. Really on VM per job to avoid this isn’t a great answer.