Make Veeam Instant Recovery use a preferred network

Introduction

In this post, I will be discussing an issue we ran into when leveraging the instant recovery capability of Veeam Backup & Replication (VBR). The issue became apparent when we set up the preferred networks in VBR. The backup jobs and the standard restores both leveraged the preferred network as expected. We ran into an issue with instant recovery. While the mount phase leverages the preferred network this is not the case during the restore phase. That uses the default host management network. To make Veeam instant recovery use a preferred network we had to do some investigation and tweaking. This is what this blog post is all about.

Overview

We have a Hyper-V cluster, shared storage (FC), that acts as our source. We back up to a Scale Out Backup Repository that exists of several extend or standard repository. Next to the management network, all of the target and source nodes have connectivity to one or more 10/25Gbps networks. This is leveraged for CSV, live migrations, storage replication, etc. but also for the backup traffic via the Veeam Backup & Replication preferred network settings.

Make Veeam Instant Recovery use a preferred network
We have 2 preferred networks. This is for redundancy but also because there are different networks in use in the environment.

The IPs for the preferred networks are NOT registered in DNS. Note that the Veeam Backup & Replication server also has connectivity to the preferred networks. The reason for this is described in Optimize the Veeam preferred networks backup initialization speed.

As you might have guessed from the blog post title “Make Veeam Instant Recovery use a preferred network” this all worked pretty much as expected for the backup themselves and standard restores. But when it came to Instant Recovery we noticed that while the mount phase leveraged the preferred network, the actual restore phase did not.

Make Veeam Instant Recovery use a preferred network
Instant VM recovery overview.

To read up on instant recovery go to Instant VM Recovery. But in this blog post, it is time to dive into the log files to figure out what is going on?

To solve the issue we dive into the VBR logs, but also into the logs on both the repository/extent and the Hyper-V server where we restore the VM to. The logs confirmed what we already noticed. For backups and normal restores, it correctly decides to use the preferred network. With instant recovery for some reason, in the restore phase, it selects the default host network which is 1Gbps.

Investigating the logs

Reading logs can seem an intimidating tedious task. The trick is to search for relevant entries and that is something you learn by doing. Combine that with an understanding of the problem and some common sense and you can quickly find what you need to look for. Than it is key to figure out why this could be happening. Sometimes that doesn’t work out. In that case, you contact Veeam support. That’s what I did as I knew well what the issue was and I could see this reflected in the logs. But I did not know how to handle this one.

We will look at the logs on the VBR Server, the repository where the backup files of the VM live, and the Hyper-V node where we restore the VM to investigate this issue.

The VBR log

Let’s look at the restore log of the virtual machine for which we perform instant recovery on the VBR server. We notice the following.

The actual restore phase of Instant Recovery leveraging the 1Gbps default host management network

The repository logs

These are the logs of our repository or extent where the restore reads the backup data from. There are actually 2 logs. One is the mount log and the other is the restore log.

We first dive into the Agent.IR.DidierTest08.Mount.Backup-Side.log of our test VM instant recovery. Here we can see connections to our Hyper-V server node where instant recover the test VM over the preferred network. Note that is is the Hyper-V server node that acts as the client!

Agent.IR.DidierTest08.Mount.Backup-Side.log

Let ‘s now parse the Agent.IR.DidierTest08.Restore.Backup-.Side.log of our test VM instant recovery. No matter how hard we look we cannot find any connection attempt, let alone a connection to a preferred network (10.10.110.0/24). We do see the restore work over the default management network (10.18.0.0/16). Also note here that it is the repository node that connects to the Hyper-V node (10.18.230.5), it acts as a client now.

Agent.IR.DidierTest08.Restore.Backup-Side.log

The restore target log

This is the Hyper-V server to where we restore the virtual machine. There are multiple logs but we are most interested in the mount log and the restore log.

We first dive into the Agent.IR.DidierTest08.Mount.HyperV-Side.log of our test VM instant recovery.The mount log shows what we already know. It also shows that it is the Hyper-V server that initiates the connection. This does leverage the preferred network (10.10.110.0/24).

Agent.IR.DidierTest08.Mount.HyperV-Side.log

It also shows the mount phase does leverage the preferred network (10.10.110.0/24).

Agent.IR.DidierTest08.Mount.HyperV-Side.log

But when we look at the restore phase log Agent.IR.DidierTest08.Restore.HyperV-Side.log we again see that the default host management network is used instead of the preferred network.

Agent.IR.DidierTest08.Restore.HyperV-Side.log

Again, we see that the Hyper-V server node during the restore phase acts as the server while the repository is the client (10.18.217.5).

Summary of our findings

Based on our observations on the servers (networks used) and investigating the logs we conclude the following. During an instant recovery, the VM is mounted on the Hyper-V host (where the checkpoint is taken). During the mount phase, the Hyper-V host acts as the client, while the repository acts as the client. This leverages the preferred network. Now, during the restore phase, the repository acts as the client and connects to the Hyper-V host that acts as the server. This does not leverage the preferred network.

This indicates that the solution might lie in reversing the client/server direction for the restore phase of Instant Recovery. But how? Well, there is a setting in Veeam where we can do just this.

Make Veeam Instant Recovery use a preferred network

I have to thank the Veeam support engineer that worked on this with me. He investigated the logs that I sent him as well but with more insight than I have. Those were clean logs just showing reproductions of the issue in combination with a Camtasia Video of it all. That way I showed him what was happening and what I saw while he also had the matching logs to what he was looking at.

Sure enough, he came back with a fix or workaround if you like. To make sure instant recovery leverages the preferred network we needed to do the following. On each extent, in properties, under credentials go to network settings and check “Run server on this side” under “Preferred TCP connection role”.

Make Veeam Instant Recovery use a preferred network
On each extend, in properties, under credentials go to network settings and check Run server on this side under “Preferred TCP connection role”.

The “normal” use cases are for example when most the repository FQDN resolves into several IP addresses and Hyper-V FQDN is resolved into 1 only. This was not the case in our setup, the preferred networks are not registered in DNS. But here leveraging the capability to set “Run server on this side” solves our issue as well.

Parse the logs with “Run server on this side” enabled on the repository/extents

When we start a clean test and rerun an Instant Recovery of our test VM we now see that the restore phase does leverage the preferred network. The “Run server on this side” setting is also reflected in the restore phase logs on both the repository and the backup server.

Agent.IR.DidierTest08.Restore.Backup-Side.log. The Hyper-V server is the client (10.10.110.211) connects to the server, which is the repository.
Agent.IR.DidierTest08.Restore.HyperV-Side.log. The server is now indeed the repository (10.10.110.14) and the client is the Hyper-V server node.

In the VBR log itself, we notice the “Run server on this side” has indeed been enabled.

Host ‘REPOSITORYSERVER’ should be server, reversing connection

IR.DidierTest08.Mount.log

In the Agent.IR.DidierTest08.Restore.Backup-Side.log on the repository server, we also see this setting reflected.

Agent.IR.DidierTest08.Restore.Backup-Side.log

Based on the documentation about “Run server on this side” in https://helpcenter.veeam.com/docs/backup/hyperv/hv_server_credentials.html?ver=95u4 you would assume this is only needed in scenarios where NAT is in play. But this doesn’t cover all use cases. Enabling this checkbox on a server means it does not initiate the connection but waits for the incoming connection from its partner. In our case that also causes the preferred network to be picked up. Apparently, all that is needed is to make sure the Hyper-V hosts to where we restore act as the client and initializes the connection to the server, our repository or extents in SOBR.

Conclusion

We achieved a successful result. Our instant recoveries now also leverage the preferred network. In this use case, this is really important as multiple concurrent instant recoveries are part of the recovery plan. That’s why we have performant storage solutions for our backup and source in combination with high bandwidth on a capable network. In the end, it all worked out well with a minor tweak to make Veeam Instant Recovery use a preferred network. This was however unexpected. I hope that Veeam dives into this issue and sees if the logic can be improved in future updates to make this tweak unnecessary. If I ever hear any feedback on this I will let you know.

Optimize the Veeam preferred networks backup initialization speed

When Veeam preferred networks cause slow backup initialization speeds

When using preferred networks in Veeam you choose to use another than the default host network for backups and restores. In this post, we’ll discuss how to optimize the Veeam preferred networks backup initialization speed because we aim for optimal performance. TL-DR: You need to provide connectivity to the preferred networks for the Veeam Backup & Replication server. It seems a common mistake I run into every now and then. Ultimately it makes people think Veeam is slow. No, it is just a configuration mistake.

Why use a preferred network?

Backups can fill up a 1Gbps pipe very fast. Many people still use 1Gbps networking as default connectivity to the hosts. Even when they leverage 10Gbps or better it is often in a converged network setup. This means that only part of the bandwidth goes to host connectivity. Few have 10Gbps for “just” host connectivity. This means it makes sense to select a different higher bandwidth network for backup and restore traffic.

Hence for high volume, high-performance backup and restores it is smart to look for a bigger pipe to leverage. Some environments have dedicated backup networks at 10Gbps or better. But we find way more high bandwidth networks for other purposes. In Hyper-V environments, you’ll have those for SMB networking like CSV, Live Migration variants and storage replication. Hyper-Converged Infrastructure deployments use these networks for storage as well. With S2D you’ll find more and more 25/50/100Gbps. All these can be leveraged as a preferred backup network in Veeam

Setting up a preferred network

Setting up a preferred network is easy. First of all, you figure out which network to use. You then add those to the preferred networks as follows:

In file menu select “Network Traffic Rules”

Optimize the Veeam preferred network backup initialization speed

Click “Add” and specify the source IP as well as the target IP range. You can op to encrypt the traffic and /or set a bandwidth limit.

We have two SMB storage networks available, we enter both.

There is no need to have the preferred network registered in DNS. It will work fine without.

I hope it is clear that the source (Hyper-V Hosts), the target (backup repository or the extends in a Scale-Out Backup Repository) and any Off Host Proxies need connectivity to the preferred network(s). If you leverage WAN accelerators, Gateways Servers, log shipping servers than these also need access. Last but not least you should also make sure that the Veeam Backup Server (VBR) has access to the preferred networks. This is one that a lot of people seem to forget. May because it is most often a VM if it is not a shared role on the repository server or such and things do work without it.

When the VBR server has no access to the preferred networks things still work but initialization of the backup and restore jobs is a lot slower. Let’s test this.

Slow Initialization of backup and restore jobs

As a result of using preferred networks you might probably notice the following:

  • First of all, we notice a slow down in the overall initialization of the backup and restore job.
  • This manifests itself in a slow start of the actual VM backup/restore and reducing the number of simultaneous backups/restores of VMs within a job.

Without the VBR server having connectivity to the preferred networks

23:54 to complete the backup job (no connectivity to the preferred network)

Optimize the Veeam preferred networks backup initialization speed

With the VBR server having connectivity to the preferred networks. Notice how smooth and continuous the throughput is.

07:55 to complete the backup job (with connectivity to the preferred network) => 3 times as fast.

When you look into the Veeam backup logs for this job you will find at various stages attempts by the VBR server to connect to the preferred networks. If it can’t it has to wait until it times out. You see entries like:

A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.10.110.2:2509 (System.Net.Sockets.SocketException)

Optimize the Veeam preferred network backup initialization speed
Just a small part of all the NetSocket time out you will find for every single VM in the job. Here VBR is trying to connect to one of the extends in the SOBR.

This happens for every file in the backups (config files and disks) for every extend in the Scale-Out Backup Repository (per VM backup chain). This slows down the entire backup job tremendously.

Conclusion

I always make sure that the VBR servers in my environments have preferred network connectivity. Consequently, initialization is faster for both backups and restores. Test it out for yourself! It is the first thing I check when people complain of really slow backup. Do they have preferred networks set up? Check if the VBR server has connectivity to them!

Veeam Backup & Replication Preferred Subnet & SMB Multichannel

Introduction

In a previous blog post Veeam Backup & Replication leverages SMB Multichannel post we showed that Veeam backup & Replication leverages SMB multichannel when possible.

But what about Veeam Backup & Replication Preferred Subnet & SMB Multichannel, does that work? We mentioned that we wanted to answer the question what happens if we configure a preferred back-up network in Veeam Backup & Replication. Would this affect the operation of SMB multichannel at all? By that I means, would enabling a preferred network in Veeam prevent multichannel from using more than one NIC?

In this blog post we dive in to that question and some scenarios. We actually need to be able to deal with multiple scenarios. When you have equally capable NICs that are on different subnets you might want to make sure it uses only one. Likewise, you want both to be used whether they are or are not on the same subnet even if you set a preferred subnet in Veeam. The good news is that the nature of SMB Multichannel and how Veaam preferred networks work do allow for flexibility to achieve this. But it might not work like you would expect, unless you understand SMB Multichannel.

Veeam Backup & Replication Preferred Subnet & SMB Multichannel

For this blog post we adapt our lab networking a bit so that our non-management 10Gbps rNICs are on different subnets. We have subnet 10.10.110.0/24 for one set of NICs and 10.10.120.0/24 for the second set of NICs. This is shown in the figure below.

image

These networks can live in a separate VLAN or not, that doesn’t really matter. It does matter if to have a tagged VLAN or VLANS if you want to use RDMA because you need it to have the priority set.

We now need to configure our preferred network in Veeam Backup & Replication. We go to the main menu and select Network Traffic Rules

clip_image004

In the Global Network Traffic Rules window, click Networks.

clip_image006

In the Preferred Networks window, select the Prefer the following networks for backup and replication traffic check box.

clip_image008

Click Add. We use the CIDR notation to fill out our preferred network or you can use the network mask and click OK.

To prove a point in regards to how Multichannel works isn’t affected by what you fill out here we add only one of our two subnets here. SMB will see where it can leverage SMB multichannel and it will kick in. Veeam isn’t blocking any of its logic.

So now we kick of a backup of our Hyper-V host to our SMB hare target backup repository. We can see multichannel work just fine.

clip_image010

Below is a screenshot on the backup target of the backup running over SMB multichannel, leveraging 2 subnets, while having set only one of those as the preferred network in Veeam Backup & Replication

clip_image012

Look at my backup fly … and this is only one host that’s being backup (4 VMs actually). Have I told you how much I love flash storage? And why I’m so interested in getting ReFS hybrid volumes with SSD/SATA disks to work as backup target? I bet you do!

Looking good and it’s easy, right? Well not so fast!

Veeam does not control SMB Multichannel

Before you think you’re golden here and in control via Veeam lets do another demo. In the preferred network, we enter a subnet available to both the source and the target server but that is an LBFO (teamed) NIC with to 1Gbps members (RSS is enabled).

clip_image014

No let’s see what happens when we kick of a backup.

clip_image016

Well SMB multichannel just goes through its rules and decided to take the two best, equally capable NICs. These are still our two 10Gbps rNICs. Whatever you put in the preferred network is ignored.

This is neither good or bad but you need to be aware of this in order to arrange for backups to leverage the network path(s) you had in mind. This is to avoid surprises. The way to do that the same as you plan and design for all SMB multichannel traffic.

As stated in the previous blog post you can control what NICs SMB multichannel will use by designing around the NIC capabilities or if needed disabling or enabling some of these or by disabling SMB multichannel on a NIC. This isn’t always possible or can lead to issues for other workloads so the easiest way to go is using SMB Multichannel Constraints. Do note however that you need to take into consideration what other workloads on your server leverage SMB Multichannel when you go that route to avoid possible issues.

As an example, I disabled multichannel on my hosts. Awful idea but it’s to prove point. And still with our 10.10.0.0/16 subnet set as preferred subnet I ran a backup again.

clip_image018

As you can see the 2*1Gbps LBFO NIC is doing all the lifting on both hosts as it’s switch independent and not LACP load balancing mode we’re limited to 1Gbps.

So how do we control the NICs used with SMB multichannel?

Well SMB Multichannel rules apply. You use your physical design, the capabilities of the NICs and SMB constraints. In reality you’re better off using your design and if needed SMB multichannel constraints to limit SMB to the NICs you want it to use. Do not that disabling SMB Multichannel (client and or server) is a global for the host. Consider this as it affects all NICs on the host, not just the ones you have in mind for your backups. In most cases these NICs will be the same. Messing around with disabling multichannel or NIC capabilities (RSS, RDMA) isn’t a great solution. But it’s good to know the options and behavior.

Some things to note

Realize you don’t even have to set both subnets in the preferred subnets if they are different. SMB kicks of over one, sees it can leverage both and just does so. The only thing you manipulated here SMB multichannel wise is which subnet is used first.

If both of our rNICs would have been on the same subnet you would not even have manipulated this.

Another thing that’s worth pointing out that this doesn’t require your Veeam Backup & Replication VM to have an IP address in any of the SMB multichannel subnets. So as long as the source Hyper-V hosts and the backup target are connected you’re good to go.

Last but not least, and already mentioned in the previous blog post, this also leverages RDMA capabilities when available to help you get the best throughput, lowest latency and leave those CPU cycles for other needs. Scalability baby! No I realize that you might think that the CPU offload benefit is not a huge deal on your Hyper-V host but consider the backup target being hammered by several simultaneous backups. And also consider that some people their virtual machines look like below in regards to CPU usage, in ever more need of more vCPU and CPU time slices.

clip_image020

And that’s what the Hyper-V host looks like during a backup without SMB Direct (with idle VMs mind you).

clip_image022

All I’m saying here is don’t dismiss RDMA too fast, everything you can leverage to help out and that is available for free in the box is worth considering.

Note: I have gotten the feedback that Veeam doesn’t support SMB Direct and that this was confirmed by Veeam Support. Well, Veeam Backup & Replication leverages SMB 3 but that’s an OS feature. Veeam Backup & Replication will work with SMB Multichannel, Direct, Signing, Transparent Failover … It’s out of the Veeam Backup & Replication scope of responsibilities as we have seen here. You feel free to leverage SMB Direct whether that is using iWarp/Roce or Infiniband. This information was confirmed by Veeam and bears the “Anton Gostev seal of approval”. So if SMB Direct cause issues you have a configuration problem with that feature, it’s not Veeam not being able to support it, it doesn’t know or care actually.

Conclusion

The elegance and simplicity of the Veeam Backup & Replication GUI are deceiving. Veeam is extremely powerful and is surprisingly flexible in how you can leverage and configure it. I hope both my previous blog post and this one have given you some food for thought and ideas. There’s more Veeam goodness to come in the coming months when times allows. Many years ago, when SMB 3 was introduced I demonstrated the high availability capabilities this offered for Veeam backups. I’ll be writing about that in another blog post.