WEBINAR: Troubleshooting Microsoft Hyper-V – 4 Tales from the Trenches

I’ve teamed up with Altaro as a guest in one of their Hyper-V webinars to discuss trouble shooting Microsoft Hyper-V. I’ll be joining my fellow Microsoft Cloud and Datacenter Management  Andy Syrewicze for this in a virtual cross Atlantic team

WEBINAR: Troubleshooting Microsoft Hyper-V - 4 Tales from the Trenches

We’ll give you some pointers on good practices and where to start when things go south. We’ll also add some real world examples to the mix to spice things up. As you can imagine these issues are often a lot more fun after the facts and after having solved them. If you work in IT long enough you know that one day (or night) trouble will come knocking and that the stress related with that can be gut wrenching.

The good news is that it isn’t the end of the world. In well designed and managed environment you can minimize downtime and you do not have to suffer through unrecoverable losses as long as you’re well prepared. We hope this webinar will help the attendees prevent those problems in their environment. If not, it will provide you with some insight on how to prepare for and handle them.

The webinar is on February 25th, 2016 at 4pm CET / 10am EST!

The time has been chosen to make it feasible for as many many people across all time zones to attend. So, go on, register, it’s free and both Andy and I are looking forward to sharing our trouble shooting tales from the trenches.

webinar-Join-button-troubleshooting-hyperv

Confusing Mellanox Windows PerfMon Counters

Introduction

So you start out doing SMB Direct. Maybe you’re doing RoCE, if so there’s a good chance you’ll be using the excellent Mellanox cards. You studied hard, read a lot and put some real effort into setting it up. The SMB Direct / DCB configuration is how you think it should be and things are working as expected.

Curious as you are you want to find out if you can see Priority Flow Control work. Well, the easiest way to do so is by using the Windows Performance Monitor counters that Mellanox provides.

Confusing Mellanox Windows PerfMon Counters

So you take your first look at the Mellanox Adaptor QoS Perfmon counters for ConnectX series for SMB Direct (RDMA) traffic. When you want to see what’s happening in regards to pause frames that have been sent and received and what pause duration was requested from the receiving hop (or received from the sending hop) you can get confused. The naming is a bit counter intuitive.

clip_image002

The Rcv Pause duration is not the duration requested by the pause frames the host received, but by the pause frames that host sent. Likewise, the Sent Pause duration is not the duration requested by the pause frames the host send, but by the pause frames that host received.

clip_image004

So you might end up wondering why your host sends pause frames but to only see the Rcv Pause duration go up. Now you know why Smile.

Now there were plans to fix this in WinOF 4.95. The original release note made mention of this and this made me quite happy as most people are confused enough when it comes to RDMA/RoCE/DCB configurations as it is.

A screenshot of the change in the original Mellanox WinOF VPI Release Notes revision 4.95

clip_image005

Unfortunately, this did not happen. It was removed in a newer version of these release notes. My guess is it could have been a breaking chance of some sort if a lot of tooling or automation is expecting these counter names.

I still remember how puzzled I looked at the counters which to me didn’t make sense and the tedious labor of empirical testing to figure out that the wording was a bit “less than optimal”.

But look, once you know this you just need to keep it in mind. For now, we’ll have to live with some confusing Mellanox Windows PerfMon counter names. At least I hope I have saved you the confusion and time I went through when first starting with these Mellanox counters. Other than that I can only say that you should not be discouraged as they have been and are a great tool in checking RoCE DCB/PFC configs.

Creating a bootable VHD or VHDX from an existing one

Creating a bootable VHD or VHDX from an existing one is a great capability to have.There are a couple of reasons why one might need or want to do this. In windows 2012 (R2) this is even a part of normal live migration operations. Storage live migration for example is nothing but the live streaming of the data of your live virtual hard disk into a new VDH/VHDX. You have multiple options when it comes to creating a bootable VHD/VHDX from an existing one and they all serve their specific purposes,which might or might not overlap.

This is great stuff to do migrations, reorganize storage, defrag your internal dynamic VHDX structure etc.  But you’re not limited to those options. When you want to convert from VHD to VHDX you’ll leverage Convert-VHDX. You can also create a new VHDX with an old one as the source with New-VHDX. Great for all kind of operations including off line migration, updates, testing on exact copies of the original disk etc. You might think it’s better to just copy the disk but for a conversion that will not work, that won’t deal with internal fragmentation which can be important for performance testing when your migrating to new storage, a new cluster & Hyper-V version and such.

Recently people asked me if this would work with their OS disk. The virtual disk that the boot from. Yes that will work. Both New-VHD and Convert-VHD will create a fully bootable new virtual disk if the source virtual disk was bootable to begin with. No problem, They have to, if you think about it. Using Convert-VHD to move from VHD to VHDX and even change the cluster sizes of the disk would be no good if the VM doesn’t boot anymore. Like wise with New-VHD.

The only thing that need some real tender loving care is when you convert a VM from generation to generation 2. The script provided to to that by John Howard (MSFT) use fully supported technologies. The script itself is not a supported product, but you’re not doing anything unsupported with it.

So all people needing to convert, defrag or move  VMs to new virtual hard disks. Do a few test to verify your assumptions and go forward. Step into that bright new future you’ve been missing out on for the past 3 years.

Trunking With Hyper-V Networking

When doing lab work, or real life implementations you’ll need to go beyond the basic 101 stuff to build solutions every now and then. This is especially true when using virtual network appliances. Networking means you’ll you’ll be dealing with Link Aggregation Groups, Trunking, MLAG, routing, LACP … in short the tools of the trade when doing networking. In my experience I use trunking in Hyper-V mostly to mimic real world scenarios where trunking is used (firewall, routers, load balancers). These tend to be limited in usable ports in real life. So even, before you run out of physical ports on your Hyper-V host to work with we leverage them to mimic the real live environment. This leads us to trunking with Hyper-V networking

I for one have used this on 10Gbps ports on bot physical and virtual load balancers in the uplink to the switches. As you can imagine when doing redundant (teaming) cabling with HA load balancers you’re consuming 10Gbps ports and not all VLANs warrant a dedicated 10Gbps uplink, even if you had ‘m.

Trunking & VLAN’s are the way we deal with this in the network hardware world and we can do the same in Hyper-V. In the Hyper-V Manager GUI you will not find a way to define a trunk on an vNIC attached to a vSwitch. But this can be done via PowerShell. So please do not reject Hyper-V as not being up to the job. It is. Let me show you how you can do trunking with Hyper-V networking.

Generally on a clean install I dump the default vNIC. DO NOT DO this blindly on an existing deployed appliance virtual machine.

#Delete the default network adapter
Remove-VMNetworkAdapter -VMName VLM200-1 -Name "Network Adapter"

I then add the number of ethernet ports I need on my Kemptechnologies virual Load Master.

#Create the VLM200 ports (4 like it's physical counterpart)
For ($Count=0; $Count -le 3; $Count ++)
{
Add-VMNetworkadapter -VMName VLM200-1 -Name "Eth$Count"
}

A peak at our handy work via Get-VMNetworkAdapter -VMName VLM200-1 shows our 4 ports.

image

As you can see I like to name my network adapters with a distinctive name. In combination with the switch name it enables me to identify the NICs better. Combine that with a good naming policy inside the VM if possible. In Windows Server 2016 you can hot add and remove vNICs and new “Device Naming”

(see Hot add/remove of network adapters and enabling device naming in Windows Server Hyper-V) functionality which only makes the experience better in relation to uptime and automation.

Now let’s say we use eth0 for management and for the HA heartbeat. That leaves Eth2 and Eth3 for workloads. We could even aggregate these (redundancy, heart beat). In this demo we’ll configure Eth3 as a trunk with a list of allowed VLANs. We keep the native VLAN ID on 0 as it is by default. Only in specific situations where you have changed this in the network should this be changed.

#Trunk Eth3 and add the required VLAnIDs
Set-VMNetworkAdaptervlan -VMName VLM200-1 -VMNetworkAdapterName "Eth3"-Trunk -AllowedVlanIdList "10, 20, 30" -NativeVlanId 0

Which delivers us what we need to get our network appliance going

image

In your virtual appliance you can now create VLANs on Eth3. How this shows up is dependent on the appliance. In this example a Kemp Virtual Load Master. Here we mimic a 4 port load master. We’re not doing trunking because we ran out of the max supported number of NICs we can add to a virtual machine.

image

A word of warning. You will not see this configuration in the settings via the GUI.
Manipulating the VLAN settings in the GUI will overwrite the settings without a warning.
So be careful with configuration of your virtual network appliance(s).  As an example I’ll touch the VLAN setting of Eth3 and give it VLAN 500.

image

We now have a look at our VLAN settings of the appliance

image

That vNIC is now in Access mode with VLAN 500. Ouch, that will seriously ruin your day in production! Be careful!

On top of this some appliances do not respond well to such misconfigurations on the switch side (both physical and virtual switches). This leads not only to service interruption but could lead to the inability to mange the appliance, requiring a reboot of them etc.

Anyway, so yes you can do trunking with Hyper-V networking on a vNIC but this normally only makes sense I you have an appliance running that knows what to do with a trunk such as a virtual  firewall, router or load balancer.