Find All Virtual Machines With A Duplicate Static MAC Address On A Hyper-V Cluster With PowerShell

During some trouble shooting recently I needed to find all virtual machines with a duplicate static MAC address on a Hyper-V cluster with PowerShell. I didn’t feel like doing this via the GUI for obvious reasons. I needed this because while trying to find the reason why a VM lost connectivity to one of it two NICs I discovered it had a static MAC address. No one had a good reason for this VM to have a static MAC address I stopped the VM, switched that NIC to a dynamic MAC address and rebooted. All was well afterwards

But I still needed to find out what potentially caused the issue, my guess was a duplicate MAC address (what else?). The biggest candidates for having a duplicate MAC was another VM or VMs. So here’ s some PowerShell that will list all clustered VMs that have a static MAC address.

Get-ClusterGroup | ? {$_.GroupType -eq 'VirtualMachine'} `
| get-VM | Get-VMNetworkAdapter | where-object {$_.DynamicMacAddressEnabled -eq $False}

Let’s elaborate the code a bit and search for the occurrence of duplicates in MAC address

$AllNicsWithStaticMAC = Get-ClusterGroup | ? {$_.GroupType -eq 'VirtualMachine'} `
| get-VM | Get-VMNetworkAdapter | where-object {$_.DynamicMacAddressEnabled -eq $False}


$AllNicsWithStaticMAC.GetEnumerator() | Group-Object MacAddress | ? {$_.Count -gt 1} | ft * -autosize

The result is as follows

image

So in our lab simulation we have found a static MAC address that occurs 3 time!

If you have 200 VMs running on that cluster you might not want to look over the list manually, not that I’m hoping you have 200 VMs with the same MAC address, but just to find the servers that have the same MAC address fast. For this we adapt the above PowerShell a bit

$AllNicsWithStaticMAC = Get-ClusterGroup | ? {$_.GroupType -eq 'VirtualMachine'} `
| get-VM | Get-VMNetworkAdapter | where-object {$_.DynamicMacAddressEnabled -eq $False}


$AllNicsWithStaticMAC.GetEnumerator() | Group-Object MacAddress | ? {$_.Count -gt 1} | ft * -autosize


if($AllNicsWithStaticMAC -ne $null)
{
    (($AllNicsWithStaticMAC).GetEnumerator() | Group-Object MacAddress `
    | ? {$_.Count -gt 1}).Group | Ft MacAddress,Name,VMName -GroupBy MaCAddress -AutoSize
}
Else
{
    "No Static MAC addresses where found on your cluster"
}

Which results in a nice list of the duplicate MAC address, on what Network adapter is sits an on what virtual machine. It sorts by (duplicate) MAC address, Network Adapter Name and VMName.

image

The lab demo is a bit fabricated as I’m not creating duplicate MAC address for this blog on my lab clusters.

I hope this helps some of you when you need to find all virtual machines with a duplicate static MAC address on a Hyper-V cluster with PowerShell. Now you can adapt the code to only look for dynamic duplicate MAC addresses or both static and dynamic MAC addresses. You get the gest. Thank your for reading.

The Mysterious Case of Infrequent Network Connectivity Issues on 2 Hyper-V VMs Out of 40 Guests

In The Mysterious Case of Infrequent Network Connectivity Issues on 2 Hyper-V VMs Out of 40 Guests I share a trouble shooting experience with you. I was asked if I could possibly take a look at a weird, but very infrequent network issue with 2 VMs (W2K12R2) on a cluster (W2K12R2) running over 40 guests? Sure! These 2 virtual machines worked well 98% of the time. About 2% of the time they just fell of the network, sometimes both vNICs, sometimes both VMs. Asking what they meant, they said unreachable. But we can’t find anything wrong as all other VMs run fine with the same configuration on the same hosts. They told me there was nothing in the event logs of either the host or the guests to explain any of this. A reboot or 2 or even a live migration sometimes fix the issue. Normally the monthly patch cycle prevent to many problems with connectivity. Pretty weird! Usually bad firmware, drivers or bad offload feature support can cause issues, but that would not target just 2 out of 40 VMs that have the same settings.

It was only these 2 VMs, not matter what host the were running on in the cluster. As the the vNICs shared the same 2 vSwitches (teamed) with all other VMS that never had issues I was pretty sure the configuration of the switches, NIC, teams and vSwitch were OK. This was verified for due diligence and it  checked out on all hosts as expected. All firmware, drivers and offloads were done correctly.

I also checking the VLANs settings of the vNIC themselves for those two VMs and compared them a couple of VMs that had no issues what so ever and found them to be identical.

At first everything seemed fine and I was stumped. The event logs both in the VMs as on the hosts were squeaky clean. After that exercise I started running some PowerShell command lets to take a look at the configuration of the VMs on the hosts. You see the GUI does not expose all possible configurations and I wanted to look every configuration option. That’s when I found the following

image

The vNIC for the 2 offending VMs were in Access mode while the VlanList had a single value 0 (basically meaning untagged, it’s a reserved VLAN for priority tagging and the use is not 100% standard across switch vendors). This just didn’t compute. In the GUI we did not see this, there things looked normal.

image

You cannot even set this in the GUI, it won’t allow you.

image

image

But when run in a PowerShell command it allows you to make this configuration. So maybe that’s what’s happened.

Set-VMNetworkAdaptervlan -VMName DNS01 -Access -VlanId 0

No one knew, nor can I tell you. But I tested to verify this does run and makes that configuration without any issue, weird. Anyway, I resolved the issue by running the following command.

Set-VMNetworkAdaptervlan -VMName DNS01 –Untagged

image

The rare connectivity issue disappeared and all was well in 100% of the cases. That how The Mysterious Case of Infrequent Network Connectivity Issues on 2 Hyper-V VMs Out of 40 Guests came to a happy end.

Trouble Shooting Intermittent Virtual Machine Network Connectivity

I was asked to take a look at an issue with virtual machines losing network connectivity. The problems were described as follows:

Sometimes some VMs had connectivity, some times they didn’t. It was not tied to specific virtual machines. Sometimes the problem was not there, than it showed up again. It was, not an issue of a wrong subnet mask or gateway.

They suspected firmware or driver issues. Maybe it was a Windows NIC teaming bug or problems with DVMQ or NIC offload settings. There’s a lot of potential reasons, just Google Intermittent VM connectivity Issues Hyper-V and you’ll get a truckload of options.

So a round of wishful firmware, driver upgrading started. Followed by a round of wishful disabling network features. That’s one way to do it. But why not sit back an look at the issue.

Based on what they said I looked at the environment and asked it was tied to specific host as only VMs on one of the hosts had the issue.  Could it be be after a live migration or a VM restart. They didn’t really know but it could. So we started looking at the hosts. All teams for the vSwitch were correctly configured on all host. No tagged VLAN on the member NIC. No extra team interfaces that would violate the rule that there can be only one if the team is used by a Hyper-V switch. They used the switch independent teaming mode with the load balancing mode set to Dynamic, all member active. Perfect.

I asked it they used tagged VLAN on the VMs some times. They said yes. Which gave me a clue they had trunking or general mode configured on the ports. So I looked at the switches to see what the port configuration was like?  Guess what. All ports on both switches were correctly configured bar the ports of the vSwitch team members on one Hyper-V host. The one with problematic VMs. The two ports were in general mode but the port on the top switch had PVID* 100 and the one on the bottom switch had PVID 200. That was the issue. If the VM “landed” on the team member with PVID 200 it has no network connectivity.

HyperV-vSwitchTeam-WronNativeVLAN

 

* PVID (switchport general pvid 200) is the default VLAN of the port, in CISCO speak that would translate into “”native VLAN as in switchport trunk native vlan 200

Yes NIC firmware and drivers have issues. There are bugs or problems with advanced features once in a while. But you really do need to check that the configuration is correct and the design or setup makes sense. Do yourself a favor by not assuming anything. Trust but verify.

E2EVC 2015 Berlin SMB Direct Slide Deck

I attended and presented at E2EVC 2015 in Berlin from June 12th to June 14th. The networking was a blast. No “marchitecure” bull shit or vendor fairy tales what so ever and lots of very open discussions on the realities we’re seeing and facing in virtualization and cloud. Most account managers and esoteric presales would die a painful (but fast) death in this environment.

image

One session was with my Hyper-V Amigo buddy Carsten Rachfahl and was pure demo extravaganza, so no slides. My own session was “SMB Direct – The Secret Decoder Ring” and was an attempt to position this technology what by looking at the why and where followed by the how by who and when.

image

I hope a lot of people had at least a better understanding of SMB Direct, RDMA and DCB. The second aim was to take away the fear many people have of this tech by showcasing it in short demos. Time constraints where a challenge so it was not a 200 level session.

Please download the presentation here if interested.

Enjoy. If you have any concerns or questions, ask, and I’ll try to answer.