Microsoft Pulled KB4036479 for Windows Server 2012 R2

Posted on October 3, 2017 by workinghardinit

Nothing like coming back from a holiday to find out the quality assurance of Windows updates has cause some issues once again. What saved the day here is a great colleague who identified the problem, declined the update in WSUS and removed it from the affected machines. Meanwhile, Microsoft Pulled KB4036479 for Windows Server 2012 R2.

KB4036479 was to eliminated the restart that occurs during initial machine configuration (IMC) with with Windows Server 2012 R2. But after a the “successful” update it does the post install reboot, rolls it back and that process starts all over. This happened to both Windows Server 2012 R2 VMs on premises as well as in Azure IAAS. For now it has been pulled form the Microsoft Update Catalog (https://www.catalog.update.microsoft.com/Search.aspx?q=KB4036479). The issues has been discussed on the forums here.

Again, it pays to deploy and test Windows update to a lab or proving grounds environment that mimics your environment before you let it lose on your production environment. Be practical here and don’t let the desire for a perfect but non existent lab be the enemy of good, existing and usable one!

PS: Some people reported issues with KB4038774 as well, but that does not turn out to be the case. In any way these preview updates have no business being installed on production servers and I wish Microsoft would put them in a separate category so they are not detected / downloaded / approved with other production updates but allow for ease deployment /use in proving ground environments.

Windows Server 2016 RDMA and the Hyper-V vSwitch – Part II

Posted on September 13, 2017 by workinghardinit

Introduction

In part I this article I demonstrated that some of the rules in regards to SMB Direct and the Hyper-V vSwitch as we know them for Windows Server 2012 R2 have changed with Windows Server 2016. We focused on the fact that you can expose RDMA to a vNIC exposed to the management OS created on a vSwitch. This means that while in Windows Server 2012 R2 you cannot expose RDMA capabilities via a vSwitch, even when you are using a non-teamed RDMA capable NIC, this is no longer true with Windows Server 2016.

While a demo with a vSwitch on a single NIC as we did in part I is nice it’s unlikely you’ll use this often if at all in the real world? Here we require redundancy and that means NIC teaming. To do so we normally use a vSwitch created on a native Windows NIC team. But a native NIC teaming does not expose RDMA capabilities. And as such a vSwitch created against a Windows native NIC team cannot leverage RDMA either. Which was the one of the reasons why a fully converged scenario in Windows Server 2012 R2 was too limited for many scenarios. Loss of RSS on the vNIC exposes to the management OS was another. The solution to this in Windows Server 2016 Hyper-V comes with Switch Embedded Teaming (SET). Now using SET in each and every situation might not be a good idea. It depends. But we do need to know how to configure it. So let’s dive in.

Switch Embedded Teaming (SET) exposes RDMA to the vSwitch

Switch Embedded Teaming (SET in Windows Server 2016 allows multiple identical (make, model, firmware, drivers to be supported) NICs to be used or “teamed” within the vSwitch itself. The important thing to note here this does not use windows NIC teaming or LBFO (Load Balancing and Fail Over).

SET is the future and is needed or use with the Network Controller and Software Defined Networking in Windows. SET can also be used without these technologies. While today it supports a good deal of the capabilities of native Windows NIC teaming it also lacks some of them. In general SET is meant for full or partial converged scenarios with 10GBps or better NICs, not 1Gbps networking in a (hyper)converged Hyper-V scenario.

Please see New Windows Server 2016 NIC and Switch Embedded Teaming User Guide for Download for more information as there is just too much to tell about it.

Setting it up

We start out with a 2-node cluster where each node has 2 RDMA NICs (Mellanox ConnectX-3) with RDMA enabled and DCB configured. Live migration of VMs between those nodes works over SMB Direct works. All NIC are on the same subnet 172.16.0.0/16 (thanks to Window Server 2016 Same Subnet Multichannel) and are on VLAN 110. In Failover Cluster Manager (FCM) that looks like below.

We’ll now use the rNICs to create a Switch Embedded Team.

#Create a vSwith
New-VMSwitch -Name RDMA-SET-vSwitch -NetAdapterName "NODE-A-S4P1-SW12P05-SMB1","NODE-A-S4P2-SW13P05-SMB2" -EnableEmbeddedTeaming $true

#This gives us a vSwitch in Hyper-V to use with the VMs this can be verified with PowerShell and in Hyper-V Manager.
Get-VMSwitchTeam -Name "RDMA-SET-vSwitch" | fl

Note that the teaming mode is switch independent, the only option supported with SET in Window Server 2016.

This also gives us a vNIC exposed to the management OS (default)

Get-VMNetworkAdapter -managementos

This is also visible as a vNIC in the mamagement OS called “vEthernet (RDMA-SET-vSwitch)”

Get-Netadapter -name "vEthernet (RDMA-SET-vSwitch)" | fl

This will be used to manage the host and to make its purpose clear we’ll rename it.

Rename-VMNetworkAdapter -ManagementOS -Name "RDMA-SET-vSwitch” -NewName “HOST-MGNT"

We’ll create 2 separate management OS vNICs for the RDMA traffic later. For now, we want the HOST-MGNT vNIC to have connectivity to the LAN and for that we need to tag it with VLAN 10.

Set-VMNetworkAdapterVlan -VMNetworkAdapterName "HOST-MGNT" -VlanId "10" -Access -ManagementOS

Get-VMNetworkAdapterVlan -ManagementOS -VMNetworkAdapterName "HOST-MGNT"

The vNIC actually “inherited” the IP configuration of one of our physical NICs and we need to change that to either DHCP or a correct LAN IP address and settings.

Get-NetIPAddress -InterfaceAlias "vEthernet (HOST-MGNT)"

You can use the code below to set the HOST-MGNT vNIC to DHCP

$IPVersion = "IPv4"
$NetAdapter = Get-NetAdapter -Name 'vEthernet (HOST-MGNT)' | ? {$_.Status -eq "up"}
$NetIPInterface = $NetAdapter  | Get-NetIPInterface -AddressFamily $IPVersion
If ($NetIPInterface.Dhcp -eq "Disabled")
{
# Clear the existing gateway or it may linger
If (($NetIPInterface | Get-NetIPConfiguration).Ipv4DefaultGateway)
{
$NetIPInterface | Remove-NetRoute -Confirm:$false
}
# Enable DHCP so IP address is obtained automaticaly
$NetIPInterface | Set-NetIPInterface -DHCP Enabled
# Make sure the DNS Servers are also obtained automatically
$NetIPInterface | Set-DnsClientServerAddress -ResetServerAddresses
}

To finalize the HOST-MGNT vNIC configuration we enable priority tagging on it. If we don’t we won’t see any traffic other than SMB Direct tagged at all!

# We set priority tagging on the Host vNIC or priority tagging will not work except SMB Direct traffic
Set-VMNetworkAdapter -ManagementOS -Name "HOST-MGNT" -IeeePriorityTag on

#Let's check our work
Get-VMNetworkAdapter -ManagementOS -Name "HOST-MGNT" | fl Name,IeeePriorityTag

Before we go any further we’ll remove the VLAN tag from the rNICS as we don’t want it interfering with egress traffic being tagged by them or ingress traffic being filtered because it doesn’t match the VLAN ID on the rNICs.

Set-NetAdapterAdvancedProperty -Name "NODE-A-S4P1-SW12P05-SMB1" -RegistryKeyword VlanID -RegistryValue "0"
Set-NetAdapterAdvancedProperty -Name "NODE-A-S4P2-SW13P05-SMB2" -RegistryKeyword VlanID -RegistryValue "0"

From here on we’ll focus on the RDMA capable vNICs well create and use for SMB traffic.

We create 2 vNIC on the management OS for SMB Direct traffic.

#Now add 2 host vNICs for the SMB Direct Traffic
#SMB Multichannel will take care or bandwidth aggregation and redundancy
Add-VMNetworkAdapter -SwitchName RDMA-SET-vSwitch -Name SMB-1 -ManagementOS
Add-VMNetworkAdapter -SwitchName RDMA-SET-vSwitch -Name SMB-2 -ManagementOS
#Take a peak at what we have now
Get-VMNetworkAdapter –ManagementOS

Now these vNIC need an IP address, this can be in the same subnet because we have Windows Server 2016 SMB multichannel.

New-NetIPAddress -InterfaceAlias "vEthernet (SMB-1)" -IPAddress 10.10.180.91 -PrefixLength 24 -Type Unicast
New-NetIPAddress -InterfaceAlias "vEthernet (SMB-2)" -IPAddress 10.10.190.91 -PrefixLength 24 -Type Unicast

#For good measure in my lab and for this use case I don’t need those vNICs registered in DNS
Get-NetAdapter -Name "vEthernet (SMB*)" | Set-DnsClient -RegisterThisConnectionsAddress:$false

We than also need to put the vNICs in the correct VLAN. Remember that DCB / PFC priority tagging needs tagged VLAN so carry that priority. Right now, we can see that these are untagged.

Get-VMNetworkAdapterVLAN -ManagementOS -VMNetworkAdapterName SMB*

So we tag them with VLAN ID 110

Set-VMNetworkAdapterVLAN -ManagementOS -VMNetworkAdapterName SMB-1 -Access -vlanid 110
Set-VMNetworkAdapterVLAN -ManagementOS -VMNetworkAdapterName SMB-2 -Access -vlanid 110

Get-VMNetworkAdapterVLAN -ManagementOS -VMNetworkAdapterName SMB*

We enable jumbo frames on the vNICs. Remember that the physical NICs in the SET have jumbo frames enabled as well.

Get-NetAdapter -Name "vEthernet (SMB-1)" | Set-NetAdapterAdvancedProperty -RegistryKey "*JumboPacket" -RegistryValue 9014
Get-NetAdapter -Name "vEthernet (SMB-2)" | Set-NetAdapterAdvancedProperty -RegistryKey "*JumboPacket" -RegistryValue 9014
#We can check this by running
Get-NetAdapter -Name "vEthernet (SMB-1)" | Get-NetAdapterAdvancedProperty -RegistryKey "*JumboPacket"
Get-NetAdapter -Name "vEthernet (SMB-2)" | Get-NetAdapterAdvancedProperty -RegistryKey "*JumboPacket"

Normally all traffic that is originated from vNICs gets any QOS values set to zero. There is one exception to this and that’s SMB Direct traffic. SMB Direct traffic gets tagged with its QoS priority and that is not reset to 0 as it bypasses the vSwitch completely. But if we set other priorities on other types of traffic for DCB PFC and or ETS that passes over these vNICs we must enable priority tagging on these NICs as well or they’ll be stripped away.

Set-VMNetworkAdapter -VMNetworkAdaptername SMB-1 -ManagementOS -IeeePriorityTag On
Set-VMNetworkAdapter -VMNetworkAdaptername SMB-2 -ManagementOS -IeeePriorityTag On

Get-VMNetworkAdapter -ManagementOS -Name "SMB*" | fl Name,SwitchName,IeeePriorityTag,Status

The association of the vNIC to pNICs is random. This also changes during creation and destruction (disabling NICs, restarting the OS). We can map a vNCI to a particular pNIC. This prevents suboptimal use of the available pNICs and provides for a well know predictable path of the traffic. We do this with the below PowerShell commands.

#Set the mappings
Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName SMB-1 -PhysicalNetAdapterName "NODE-A-S4P1-SW12P05-SMB1" -ManagementOS
Set-VMNetworkAdapterTeamMapping -VMNetworkAdapterName SMB-2 -PhysicalNetAdapterName "NODE-A-S4P2-SW13P05-SMB2" -ManagementOS
#Check the mappings
Get-VMNetworkAdapterTeamMapping -managementOS

Finally, last but not least, we should enable RDMA on our two vNICs or SMB Direct will not kick in at all.

#Enable RDMA on it
Enable-NetAdapterRDMA "vEthernet (SMB-1)", "vEthernet (SMB-2)"

Right now, we have it all configured correctly on one node of our 2-node cluster. The SMB network look like this now:

The cluster now looks like below.

We can live migrate VMs over SMB Direct in this mixed scenario where one node has pNICs RDMA NICs, 1 node has SET with vNICs for RMDA.

When looking at this in report mode we clearly see Node-A send SMB Direct traffic (tagged with priority 4, green) over its RDMA enabled SET vNICs to Node-B which still has a complete physical rNIC set up (blue).

As you can see in the screen shots above we now have RDMA / SMB Direct working with SET / RDMA vNICs on one node (Node-A) and pure physical RDMA NICs on the other (Node-B). This gives us bandwidth aggregation and redundancy. To complete the exercise, we configure SET on the other node as well. But it’s clear SET and RDAM will also work in a mixed environment.

We’ll discuss some details about certain aspects of the vNIC configuration in future articles. Things like the why and how of Set-VMNetworkAdapterTeamMapping and the use of -IeeePriorityTag. But for now, this is it. Go try it out! It’s the basis for anything you’ll do with SDNv2 in W2K16 and beyond.

An error occurred connecting to the cluster

Posted on September 8, 2017 by workinghardinit

An error occurred connecting to the cluster

This morning I woke up to a bunch of failed backup notifications of our trusted Veeam Backup & Replication v9.5 update 2 solution. After 3:30 AM the backups of one particular cluster started failing.

I went to have a look but I could not connect to the 3 node cluster.

I logged on to the cluster nodes themselves and did a quick verification of network connectivity, DNS etc. That was all fine. WMI services were running on all nodes but on node 2 and 3 they were not functional.

Cleary we have a WMI issue. And sure enough, no Hyper-V manager available on those 2 nodes but we did have it on the one properly functioning node.

We tested some PowerShell WMI queries (get-wmiobject mscluster_resourcegroup -computer NodeToTest -namespace “ROOT\MSCluster“) to the cluster and this confirmed that WMI was toast on those two nodes.

Fixing the issue

The good news was that all the VMs were all up and running – a few that had RHS.exe issues – but were still alive pure Hyper-V wise. That explains why they didn’t have any support calls come in. So if we can fix this without causing down time this would be great. To try this we decided to restart the WMI service.

On problematic node 2 this worked. It restarted depending services as well such as Hyper-V Virtual Machine Management, User Access Logging Service, IP Helper and the Veeam Installer Service and the Veeam Hyper-V Integration Service. We got connectivity back via Hyper-V manager but the Failover Cluster manager GUI remained an issue but now only complained about node 3.

We wanted to avoid rebooting node 3 to avoid downtime to the VMs. So what we did there is stop the depending services that we could stop. It was vmms.exe that was stuck in shutdown we just killed the process manually with stop-Process -name “vmms” -force
That allowed the WMI service to be restarted. We then started the depending services manually and we got back the connectivity to Hyper-V Manager on node 3.

The Failover Cluster manager GUI could also connect again to the cluster. We checked the cluster for other issues. When done and found OK we live migrated the VMs node per node and did a reboot of every node one by one. This to have cleanly started nodes and to see if any trouble some event were logged during the startup. Normal operations were resumed.

Do note that there is a blog on TechNet about a similar issue but with a different error message. That was caused by missing cluswmi.mof file due to an ill advised use of run mofcomp.exe *.mof. This was not the case here. A reboot of the misbehaving nodes would have done the trick as well (as blogged here Trouble Connecting to Cluster Nodes? Check WMI! ) but we avoided as much downtime as possible here by going the route we did.

Windows Server 2016 RDMA and the Hyper-V vSwitch – Part I

Posted on September 7, 2017 by workinghardinit

Introduction

With Windows Server 2012 R2, using both RDMA and the Hyper-V vSwitch on the same host required separate physical network adapters (pNICs). There are 2 reasons for this.

First a vSwitch is generally created with a native Windows NIC team. Such a NIC team does not expose RDMA capabilities.
Second is that in Windows Server 2012 R2 you cannot expose RDMA capabilities via a vSwitch, even when you are using a non-teamed RDMA capable NIC.

As a result, the need for RDMA required more NICs on the Hyper-V hosts and/or a fully converged had some serious drawbacks. As servers have been quite capable and our VMs serve ever more intensive workloads this was not dramatic. Leveraging 2*10Gbps for a vSwitch and 2*10Gbps for redundant RDMA / SMB Direct traffic have long been one of my favorite designs. It leaves room for other traffic, such as backups, and it allows for high VM density. But with 40Gbps NICs that is overkill and a tad expensive in many scenarios, even when connecting to a SOFS share for Hyper-V storage, so 4*40Gbps on a Hyper-V host is not something I ever saw in real life.

Windows Server 2016 can expose RDMA capabilities via a vSwitch even without SET

What many people seem to have missed is that reason 2 has gone in Windows Server 2016 Hyper-V. Reason 1 still holds true. But that has been solved by Switch Embedded Teaming (SET). This means that you actually do not need SET to leverage RDMA with an vSwitch in Windows Server 2016 Hyper-V. You can do this as follows:

#Create a vSwith
New-VMSwitch -Name RDMACapable-vSwitch -NetAdapterName "NODE-A-S4P1-SW12P05-SMB1"

#Now add a host vNIC for the SMB Direct Traffic
Add-VMNetworkAdapter -SwitchName RDMACapable-vSwitch -Name SMB1 -ManagementOS

#Enable RDMA on it
Enable-NetAdapterRDMA "vEthernet (SMB1)"

#Grab that vNIC on the management OS and set the VLAN - PFC requires tagged VLANs
$NicSMB1 = Get-VMNetworkAdapter -Name SMB1 -ManagementOS
Set-VMNetworkAdapterVLAN -VMNetworkAdapter $NicSMB1 -Access -VlanID 110

Below is what this looks like. We have one vNIC on the management OS leveraging RDMA/SMB Direct consuming all 10Gbps if the NIC we connected to the vSwith. This is a nice lab demo but you can see this isn’t perhaps the best idea in real life.

Other things to note

Do realize this still requires the pNIC to be RDMA capable. This is not some sort of soft RoCE or other software RDMA magic as of today. The pNIC also has to have RDMA enabled or virtual NIC won’t be able to leverage RDMA but fall back to SMB (Multichannel only) instead of SMB Direct. Likewise, RDMA has to be enabled on the vNIC as well. So don’t forget, RDMA must be enabled on both the pNIC and the vNIC for this to work.

DCB’s PFC/ETS requires a tagged VLAN to carry the priority, do don’t forget to tag the vNIC. There is actually no need to tag the pNIC as long as the switch port has the tagged VLAN set – most likely as a trunk or in general mode. If you don’t tag consistently across the entire network stack you’ll have network issues anyway and RDMA performance will be bad if it works at all.

Finally, don’t forget this is example is not using VMM /Network Controller and as such is using Set-VMNetworkAdapterVLAN and not Set-VMNetworkAdapterIsolation.

In real life, we need better and more than a single NIC vSwitch

The caveat here is that, while you have a converged setup, you have no redundancy for the vSwitch (there is no team). This also means that you’re are limited to a single NIC in regards to throughput for that vSwith. Depending on the needs of the solutions that might be perfectly fine. It it’s not – in most real-world scenarios you’ll need redundancy – you have to use SET in a converged scenario. That’s what we’ll take a look at in part 2. Then there is the question about QoS as you don’t want SMB Direct traffic to consume to much bandwidth at will. That’s still another issue to discuss and address.

Working Hard In IT

My view on IT from the trenches

Category Archives: Hyper-V

Microsoft Pulled KB4036479 for Windows Server 2012 R2

Windows Server 2016 RDMA and the Hyper-V vSwitch – Part II

Introduction

Switch Embedded Teaming (SET) exposes RDMA to the vSwitch

Setting it up

An error occurred connecting to the cluster

An error occurred connecting to the cluster

Fixing the issue

Windows Server 2016 RDMA and the Hyper-V vSwitch – Part I

Introduction

Windows Server 2016 can expose RDMA capabilities via a vSwitch even without SET

In real life, we need better and more than a single NIC vSwitch