Checkpoint references a non-existent virtual switch

Introduction

This blog post is actually one of the two additions or improvements I need to make the sample code in a previous blog post. In Shared nothing live migration with a virtual switch change and VLAN ID configuration I published a sample script. The script works well. But there are to areas of improvement. This is the first one. I should check for existing checkpoints before migrating a virtual machine. When a checkpoint references a non-existent virtual switch it prevents live migration.

Why check for existing checkpoints before migrating a virtual machine.

If you read the previous blog, you know the challenge was to shared nothing live migrate virtual machines to a host with different virtual switch names and VLAN IDs. We did so by adding dummy virtual switches to the target host. This made share nothing live migration possible. On arrival of the virtual machine on the target host, we immediately connect the virtual machine to the final virtual switch and set the correct VLAN IDs. That works very well. You drop 1 or at most 2 pings, this is as good as it gets.

However, this can lead to a problem with live migration between cluster nodes. The temporary dummy virtual switches are removed. They were only needed to perform the shared nothing live migration. But any checkpoint of a virtual machine that existed before that still refers to them. When we now try to live migrate that virtual machine it fails as the virtual switches do not exist on the target host.

Checkpoint references a non existent virtual switch
The event log is very clear about the issue.

Preventing live migration issues

So the first addition to the script is to add a check for existing checkpoints. The script can offer to remove the checkpoint, skip the virtual machine or continue. When you continue you should realize you will need to remove the checkpoint at the target if the virtual switches no longer exist on all cluster nodes.

Asking for a decision is an interactive process, so that is not optimal. Noted, most of us will sort of babysit such a script. But still, so adding a warning and logging it is maybe the better choice. The most important this is to be aware of this. In the improved demo script above I check for checkpoints, notify of their presence and then after a successful shared nothing migration I delete them (high lighted code lines).

    #The source Hyper-V host
    $SourceNode = 'NODE-A'
    #The LUN where you want to storage migrate your VMs away from
    $SourceRootPath = "C:\ClusterStorage\Volume1*"
    #The source Hyper-V host

    #The target Hypr-V host
    $TargetNode = 'ZULU'
    #The storage pathe where you want to storage migrate your VMs to
    $TargetRootPath = "C:\ClusterStorage\Volume1"

    $OldVirtualSwitch01 = 'vSwitch-VLAN500'
    $OldVirtualSwitch02 = 'vSwitch-VLAN600'
    $NewVirtualSwitch = 'ConvergedVirtualSwitch'
    $VlanId01 = 500
    $VlanId02 = 600

    #Grab all the VM we find that have virtual disks on the source CSV - WARNING for W2K12 you'll need to loop through all cluster nodes.
    $AllVMsOnRootPath = Get-VM -ComputerName $SourceNode | where-object { $_.HardDrives.Path -like $SourceRootPath }

    #We loop through all VMs we find on our SourceRoootPath
    ForEach ($VM in $AllVMsOnRootPath){
        #We generate the final VM destination path
        $TargetVMPath = $TargetRootPath + "\" + ($VM.Name).ToUpper()
        #Grab the VM name
        $VMName = $VM.Name
        $VM.VMid
        $VMName

        #If the VM is still clusterd, get it removed form the cluster as live shared nothing migration will otherwise fail.
        if ($VM.isclustered -eq $True) {
            write-Host -ForegroundColor Magenta $VM.Name "is clustered and is being removed from cluster"
            Remove-ClusterGroup -VMId $VM.VMid -Force -RemoveResources
            Do { Start-Sleep -seconds 1 } While ($VM.isclustered -eq $True)
            write-Host -ForegroundColor Yellow $VM.Name "has been removed from cluster"
        }
        #If the VM checkpoint, notify the user of the script as this will cause issues after swicthing to the new virtual
        #switch on the target node. Live migration will fail between cluster nodes if the checkpoints references 1 or more
        #non existing virtual switches. These must be removed prior to of after completing the shared nothing migration.
        #The script does this after the migration automatically, not before as I want it to be untouched if the shared nothing
        #migration fails.

        $checkpoints = get-vmcheckpoint -VMName $VM.Name

        if ($Null -ne $checkpoints)
        {
            write-host -foregroundcolor yellow "This VM has checkpoints"
            write-host -foregroundcolor yellow "This VM will ne migrated to the new host"
            write-host -foregroundcolor yellow "Only after a succesfull migration will ALL the checpoints be removed"
        }
    
        #Do the actual storage migration of the VM, $DestinationVMPath creates the default subfolder structure
        #for the virtual machine config, snapshots, smartpaging & virtual hard disk files.
        Move-VM -Name $VMName -ComputerName $VM.ComputerName -IncludeStorage -DestinationStoragePath $TargetVMPath -DestinationHost $TargetNode
    
        $MovedVM = Get-VM -ComputerName $TargetNode -Name $VMName

        $OldvSwitch01 = Get-VMNetworkAdapter -ComputerName $TargetNode -VMName $MovedVM.VMName | where-object SwitchName -eq $OldVirtualSwitch01
        if ($Null -ne $OldvSwitch01) {
            foreach ($VMNetworkadapater in $OldvSwitch01)
            {   write-host 'Moving to correct vSwitch'
                Connect-VMNetworkAdapter -VMNetworkAdapter $OldvSwitch01 -SwitchName $NewVirtualSwitch
                write-host "Setting VLAN $VlanId01"
                Set-VMNetworkAdapterVlan  -VMNetworkAdapter $OldvSwitch01 -Access -VLANid $VlanId01
            }
        }
        $OldvSwitch02 = Get-VMNetworkAdapter -ComputerName $TargetNode -VMName $MovedVM.VMName | where-object SwitchName -eq $OldVirtualSwitch02
        if ($NULL -ne $OldvSwitch02) {
            foreach ($VMNetworkadapater in $OldvSwitch02) {
                write-host 'Moving to correct vSwitch'
                Connect-VMNetworkAdapter -VMNetworkAdapter $OldvSwitch02 -SwitchName $NewVirtualSwitch
                write-host "Setting VLAN $VlanId02"
                Set-VMNetworkAdapterVlan  -VMNetworkAdapter $OldvSwitch02 -Access -VLANid $VlanId02
            }
        }

        #If the VM has checkpoints, this is when we remove them.
        $checkpoints = get-vmcheckpoint -ComputerName $TargetNode -VMName $MovedVM.VMName

        if ($Null -ne $checkpoints)
        {
            write-host -foregroundcolor yellow "This VM has checkpoints and they will ALL be removed"
            $CheckPoints | Remove-VMCheckpoint 
        }
    }

Conclusion

The issue, in this case, was fall out from our creative shared nothing live migration. But there are other ways to get into this situation. The takeaway is that we should be aware of the live migration issue when a checkpoint references a non-existent virtual switch. I hope this helps someone out there. The next blog will be about another hiccup you might come across when performing a shared nothing live migration. Thanks for reading.

Leave a Reply, get the discussion going, share and learn with your peers.

This site uses Akismet to reduce spam. Learn how your comment data is processed.