Shared nothing live migration with a virtual switch change and VLAN ID configuration

Introduction

I was working on a hardware refresh, consolidation and upgrade to Windows Server 2019 project. This mainly boils down to cluster operating system rolling upgrades from Windows Server 2016 to Windows Server 2019 with new servers replacing the old ones. Pretty straight forward. So what does this has to do with shared nothing live migration with a virtual switch change and VLAN ID configuration

Due to the consolidation aspect, we also had to move virtual machines from some older clusters to the new clusters. The old cluster nodes have multiple virtual switches. These connect to different VLANs. Some of the virtual machines have on only one virtual network adapter that connects to one of the virtual switches. Many of the virtual machines are multihomed. The number of virtual NICs per virtual machine was anything between 1 to 3. For this purpose, we had the challenge of doing a shared nothing live migration with a virtual switch change and VLAN ID configuration. All this without downtime.

Meeting the challenge

In the new cluster, there is only one converged virtual switch. This virtual switch attaches to trunked network ports with all the required VLANs. As we have only one virtual switch on the new Hyper-V cluster nodes, the name differs from those on the old Hyper-V cluster nodes. This prevents live migration. Fixing this is our challenge.

First of all, compare-vm is your friend to find out blocking incompatibilities between the source and the target nodes. You can read about that in many places. Here, we focus on our challenge.

Making Shared nothing migration work

The first step is to make sure shared nothing migration works. We can achieve this in several ways.

Option 1

We can disconnect the virtual machine network adapters from their virtual switch. While this allows you to migrate the virtual machines, this leads to connectivity loss. This is not acceptable.

Option2

We can preemptively set the virtual machine network adapters to a virtual switch with the same name as the one on the target and enable VLAN ID. Consequently, this means you have to create those and need NICs to do so. but unless you configure and connect those to the network just like on the new Hyper-V hosts this also leads to connectivity loss. That was not possible in this case. So this option again is unacceptable.

Option 3

What I did was create dummy virtual switches on the target hosts. For this purpose, I used some spare LOM NICs. I did not configure them otherwise. As a matter of fact, tI did not even connect them. Just the fact that they exist with the same names as on the old Hyper-V hosts is sufficient to make shared nothing migration possible. Actually, this is a great time point to remind ourselves that we don’t even spare NICs. Dummy private virtual switches that are not even attached to a NIC will also do.

After we have finished the migrations we just delete the dummy virtual switches. That all there is to do if you used private ones. If you used spare NICs just disable them again. Now all is as was and should be on the new cluster nodes.

Turning shared nothing migration into shared nothing live migration

Remember, we need zero downtime. You have to keep in mind that long as the shared nothing live migration is running all is well. We have connectivity to the original virtual machines on the old cluster nodes. As soon as the shared nothing live migration finishes we do 2 things. First of all, we connect the virtual network adapters of the virtual machines to the new converged virtual switch. Also, we enable the VLAN ID. To achieve this, we script it out in PowerShell. As a result, is so fast we only drop only 1 or 2 pings. Just like a standard live migration.

Below you can find a conceptual script you can adapt for your own purposes. For real migrations add logging and error handling. Please note that to leverage share nothing migration you need to be aware of the security requirements. Credential Security Support Provider (CredSSP) is the default option. If you want or must use Kerberos you must configure constrained delegation in Active Directory.

I chose to use CredSSP as we would decommission the old host soon afterward anyway. It also means we did not need Active Directory work done. This can be handy if that is not evident in the environment you are in. We started the script on every source Hyper-V host, migrating a bunch of VMs to a new Hyper-V host. This works very well for us. Hope this helps.

Sample Script

    #The source Hyper-V host
    $SourceNode = 'NODE-A'
    #The LUN where you want to storage migrate your VMs away from
    $SourceRootPath = "C:\ClusterStorage\Volume1*"
    #The source Hyper-V host

    #The target Hypr-V host
    $TargetNode = 'ZULU'
    #The storage pathe where you want to storage migrate your VMs to
    $TargetRootPath = "C:\ClusterStorage\Volume1"

    $OldVirtualSwitch01 = 'vSwitch-VLAN500'
    $OldVirtualSwitch02 = 'vSwitch-VLAN600'
    $NewVirtualSwitch = 'ConvergedVirtualSwitch'
    $VlanId01 = 500
    $VlanId02 = 600

    #Grab all the VM we find that have virtual disks on the source CSV - WARNING for W2K12 you'll need to loop through all cluster nodes.
    $AllVMsOnRootPath = Get-VM -ComputerName $SourceNode | where-object { $_.HardDrives.Path -like $SourceRootPath }

    #We loop through all VMs we find on our SourceRoootPath
    ForEach ($VM in $AllVMsOnRootPath) {
        #We generate the final VM destination path
        $TargetVMPath = $TargetRootPath + "\" + ($VM.Name).ToUpper()
        #Grab the VM name
        $VMName = $VM.Name
        $VM.VMid
        $VMName


        if ($VM.isclustered -eq $True) {
            write-Host -ForegroundColor Magenta $VM.Name "is clustered and is being removed from cluster"
            Remove-ClusterGroup -VMId $VM.VMid -Force -RemoveResources
            Do { Start-Sleep -seconds 1 } While ($VM.isclustered -eq $True)
            write-Host -ForegroundColor Yellow $VM.Name "has been removed from cluster"
        }
    
        #Do the actual storage migration of the VM, $DestinationVMPath creates the default subfolder structure
        #for the virtual machine config, snapshots, smartpaging & virtual hard disk files.
        Move-VM -Name $VMName -ComputerName $VM.ComputerName -IncludeStorage -DestinationStoragePath $TargetVMPath -DestinationHost $TargetNode
    
         $OldvSwitch01 = Get-VMNetworkAdapter -ComputerName $TargetNode -VMName $MovedVM.VMName | where-object SwitchName -eq $OldVirtualSwitch01

        if ($Null -ne $OldvSwitch01) {
            foreach ($VMNetworkadapater in $OldvSwitch01)
            {   write-host 'Moving to correct vSwitch'
                Connect-VMNetworkAdapter -VMNetworkAdapter $OldvSwitch01 -SwitchName $NewVirtualSwitch
                write-out "Setting VLAN $VlanId01"
                Set-VMNetworkAdapterVlan  -VMNetworkAdapter $OldvSwitch01 -Access -VLANid $VlanId01
            }
        }
        $OldvSwitch02 = Get-VMNetworkAdapter -ComputerName $TargetNode -VMName $MovedVM.VMName | where-object SwitchName -eq $OldVirtualSwitch02
        if ($NULL -ne $OldvSwitch02) {
            foreach ($VMNetworkadapater in $OldvSwitch02) {
                write-host 'Moving to correct vSwitch'
                Connect-VMNetworkAdapter -VMNetworkAdapter $OldvSwitch02 -SwitchName $NewVirtualSwitch
                write-host "Setting VLAN $VlanId02"
                Set-VMNetworkAdapterVlan  -VMNetworkAdapter $OldvSwitch02 -Access -VLANid $VlanId02
            }
        }
    }



Optimize the Veeam preferred networks backup initialization speed

When Veeam preferred networks cause slow backup initialization speeds

When using preferred networks in Veeam you choose to use another than the default host network for backups and restores. In this post, we’ll discuss how to optimize the Veeam preferred networks backup initialization speed because we aim for optimal performance. TL-DR: You need to provide connectivity to the preferred networks for the Veeam Backup & Replication server. It seems a common mistake I run into every now and then. Ultimately it makes people think Veeam is slow. No, it is just a configuration mistake.

Why use a preferred network?

Backups can fill up a 1Gbps pipe very fast. Many people still use 1Gbps networking as default connectivity to the hosts. Even when they leverage 10Gbps or better it is often in a converged network setup. This means that only part of the bandwidth goes to host connectivity. Few have 10Gbps for “just” host connectivity. This means it makes sense to select a different higher bandwidth network for backup and restore traffic.

Hence for high volume, high-performance backup and restores it is smart to look for a bigger pipe to leverage. Some environments have dedicated backup networks at 10Gbps or better. But we find way more high bandwidth networks for other purposes. In Hyper-V environments, you’ll have those for SMB networking like CSV, Live Migration variants and storage replication. Hyper-Converged Infrastructure deployments use these networks for storage as well. With S2D you’ll find more and more 25/50/100Gbps. All these can be leveraged as a preferred backup network in Veeam

Setting up a preferred network

Setting up a preferred network is easy. First of all, you figure out which network to use. You then add those to the preferred networks as follows:

In file menu select “Network Traffic Rules”

Optimize the Veeam preferred network backup initialization speed

Click “Add” and specify the source IP as well as the target IP range. You can op to encrypt the traffic and /or set a bandwidth limit.

We have two SMB storage networks available, we enter both.

There is no need to have the preferred network registered in DNS. It will work fine without.

I hope it is clear that the source (Hyper-V Hosts), the target (backup repository or the extends in a Scale-Out Backup Repository) and any Off Host Proxies need connectivity to the preferred network(s). If you leverage WAN accelerators, Gateways Servers, log shipping servers than these also need access. Last but not least you should also make sure that the Veeam Backup Server (VBR) has access to the preferred networks. This is one that a lot of people seem to forget. May because it is most often a VM if it is not a shared role on the repository server or such and things do work without it.

When the VBR server has no access to the preferred networks things still work but initialization of the backup and restore jobs is a lot slower. Let’s test this.

Slow Initialization of backup and restore jobs

As a result of using preferred networks you might probably notice the following:

  • First of all, we notice a slow down in the overall initialization of the backup and restore job.
  • This manifests itself in a slow start of the actual VM backup/restore and reducing the number of simultaneous backups/restores of VMs within a job.

Without the VBR server having connectivity to the preferred networks

23:54 to complete the backup job (no connectivity to the preferred network)

Optimize the Veeam preferred networks backup initialization speed

With the VBR server having connectivity to the preferred networks. Notice how smooth and continuous the throughput is.

07:55 to complete the backup job (with connectivity to the preferred network) => 3 times as fast.

When you look into the Veeam backup logs for this job you will find at various stages attempts by the VBR server to connect to the preferred networks. If it can’t it has to wait until it times out. You see entries like:

A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.10.110.2:2509 (System.Net.Sockets.SocketException)

Optimize the Veeam preferred network backup initialization speed
Just a small part of all the NetSocket time out you will find for every single VM in the job. Here VBR is trying to connect to one of the extends in the SOBR.

This happens for every file in the backups (config files and disks) for every extend in the Scale-Out Backup Repository (per VM backup chain). This slows down the entire backup job tremendously.

Conclusion

I always make sure that the VBR servers in my environments have preferred network connectivity. Consequently, initialization is faster for both backups and restores. Test it out for yourself! It is the first thing I check when people complain of really slow backup. Do they have preferred networks set up? Check if the VBR server has connectivity to them!

Attending the Veeam Vanguard Summit 2019

Attending the Veeam Vanguard Summit 2019

I am dedicating the week of the 14th -18th of October to attending the Veeam Vanguard Summit 2019. As a Veeam Vanguard, I am quite lucky to be invited to this event. I will meet fellow technologists, architects, and Vanguards from all over the world. This is a unique opportunity to learn with and from each other.

Attending the Veeam Vanguard Summit 2019
Veeam Vanguard Summit 2019

From Veeam there will be Product Strategists, Technical Analysts, Technologists, Research & Development team members as well as social media and marketing experts.

There is quite an impressive assembly of intellect, expertise, and knowledge. It is great to be in such company as there is much to learn and understand in the business of data protection and management. I sometimes feel out of my depth and comfort zone at this event. That is normal, For one, Veeam solutions and efforts span the complete breadth, depth, and width of the IT industry. On top of that, I am there to learn and then it is actually great that I am not the smartest one in the room while discussing a given subject.

Anyway, soon I will travel to Prague and make the most of my time there. As always I am looking forward to seeing many buddies and friends. And I get to meet the new Veeam team member and the new Veeam vanguards. This week will be one filled with technology and the business of data management discussions from breakfast to dinner. I love it. Thank you Veeam for inviting me over, I appreciate the opportunity!

Licensed Replay Manager Node Reports being unlicensed

Licensed Replay Manager Node Reports being unlicensed

I was doing a hardware refresh on a bunch of Hyper-V clusters. This meant deploying many new DELL PowerEdge R740 servers. In this scenario, we leverage SC Series SC7020 AFA arrays. These come with Replay Manager software which we use for the hardware VSS provider. On one of the replaced nodes, we ran into an annoying issue. Annoying in the fact that the licensed Replay Manager Node reports being unlicensed in the node’s application event log. The application consistent replays do work on that node. But we always get the following error in the application event log:

Product is not licensed. Use Replay Manager Explorer ‘Configure Server’ or  PowerShell command ‘Add-RMLicenseInfo’ to activate product license.

Product is not licensed. Use Replay Manager Explorer 'Configure Server' or  PowerShell command 'Add-RMLicenseInfo' to activate product license.

On the Replay Manager Explorer, we just see that everything is fine and licensed. Via the GUI or via PowerShell we could not find a way to “re-license” an already installed server node.

What we tried but did not help

This is not a great situation the be in, therefore we need to fix it. First of all we removed the problematic node from Replay Manager explorer and tried to re-add it. That did not help to be able to relicense it. Uninstalling the service on the problematic node also did not work. Doing both didn’t fix it either. We need another approach.

The fix

The trick to fixing the licensed Replay Manager Node reports being unlicensed is as follows. Stop the “Dell Storage Replay Manager Service” service.

Delete (or rename if you want to be careful) the Compellent folder under C:\ProgramData

Restart the “Dell Storage Replay Manager Service” service. As a result you will see the folder and the files inside being regenerated. Wait until the temp files (ReplayManager.db-shm and ReplayManager.db-wal) of this process are gone.

Open up Replay Manager Explorer or relaunch it for good measure if still open. Connect to the problematic node. Navigate to “Configure Server” On the license tab it reports that it is unlicensed. Now enter the license code and request confirmation (Activate via Internet) or Activate via phone.

The node is now licensed again.

The node is licensed again. The system needs to be configured.

The image above shows the node is licensed again. You now need to configure the system again because that info is lost. For that, enter the username and password for your SC Arrays and add the correct node.

We now test creating a replay! Most importantly, we check the node’s application event log. The error Product is not licensed. Use Replay Manager Explorer ‘Configure Server’ or  PowerShell command ‘Add-RMLicenseInfo’ to activate product license. has gone!

We only see the 3 informational entries (prepared, committed, successful) associated with a successful and completed replay.

Above all, I hope this helps others who run into this.