File-Level Restore in a hardened environment

Introduction

A File-Level Restore in a hardened environment can trip you up. In Veeam Backup & Replication the ability to do file-level restores from virtual machine backups is a handy option to have. In secured environments with an isolated and protected Veeam backup fabric, this is not always a straightforward or fast process. Let’s look at this and make some suggestions for a solution or workarounds.

File-Level Restore in a hardened environment
The Veeam Backup browser for File-Level Restores

Veeam File-Level Restore in a hardened environment

Nowadays, it is a common best practice that the Veeam Backup & Replication environment is isolated and independent from the fabrics it protects. It is even more than a best practice and ransomware educated many of us fast and hard on the need for this.

In such an environment every server in the Veeam backup fabric solution is isolated, with their own credentials and not joined or in any way dependent on the environment it is protecting. Dedicated accounts are used to run the actual backup jobs and these are not used for normal administrative or operational tasks.

The majority of Veeam functionality works just fine in such an environment. But there are some things that need to be addressed. One such thing is a file-level restore (FLR). In a secured and isolated environment, an FLR job cannot leverage admin shares to gain access to the virtual machine to where files are being restored or access the data in the backup. To work around this Veeam can leverage PowerShell Direct by which it can securely restore the files into the virtual machine anyway.

But leveraging PowerShell Direct comes at the cost of speed. 7 MB/s is okay for a small number of smaller files. It becomes frustratingly slow when it comes to restoring many and larger files. Minutes turn into hours. In some cases, this just won’t do. In our case, the DBA who needed to do the database restore was not happy and the developers waiting for the restored database were not overly pleased either.

So what other options do we have?

Set up a secured file share in the protected environment

Below we will discuss the other options you have for a File-Level Restore in a hardened environment via FLR in Veeam Backup & Recovery. But first, we prepare a landing zone in the secured environment where we can copy restored files to. For this, we set up a restore share in the protected environment where a few admins that are responsible for backup and restores can write to. No one else can enumerate, read, or write to that share. When a file gets restored there they can copy it over to where the people that need it can access it. This helps protect and limit access to restored files to just the people that need to get to it. The accounts used to access that share from the isolated backup environment do not get admin permissions on the system where that share resides or anywhere else.

Providing a restore share in the protected environment is safe. You only allow certain admins to have access to it, no one else. Since the less trusted environment (the one we protect) allows access from a more secure environment (the backup environment) this does not expose the backup environment to extra danger. At no point does this allow any access to the backup environment from the protected environment.

Use the FLR wizard to copy the files

You can use the FLR wizard to copy the file locally in the secured Veeam backup environment.

File-Level Restore in a hardened environment
File-Level Restore in a hardened environmentFLR, copy the file to restore to a local folder

From there you can copy it to the restore share in the protected environment. This is very fast as nothing is. On our network we get 230 MB/s.

But what if you don’t have the space for that locally? Not everyone can have a large local restore disk onWould it not be handy to copy it to restore share in the protected environment directly? Well yes, you can try in the wizard and it will ask for the credentials to access the file share.

But it will fail to read and restore the backups but this will fail with an “Access is denied” error.

File-Level Restore in a hardened environment

You could work around it if you use the account VBR uses to actually make the backups. But that won’t do. For one those credentials should be guarded and not use by anything or anyone but VBR. Secondly, it doesn’t work anyway.

So are you out of luck? The good news is no, you are not, so read on!

Copy the files directly from the VeeamFLR mountpoint

If you don’t have the space to restore the file locally in the secure backup environment, there is another trick. When you look at the restore log you will see what the mount server is. That is the node where you can find the C:\VeeamFLR folder, which is where the content of the virtual Machine VHDX volumes is being mounted. This means that on that node you can navigate to the folder or files you want to restore.

Copy the files directly from the VeeamFLR mountpoint

Select the ones you want to recover and just copy them. Navigate to restore share in the protected environment. You just need to provide the credentials of the dedicated account with write access to that share. Those are the only rights you need on that share. The copy is again fast at 230 MB/s. That is a lot better than 7MB/s.

There is an issue with this, however. In this scenario, you need access to the server where the volumes of the virtual machine get mounted. Normally that would be the local repository where the backup files reside and guess what, you lock these down as much as possible and don’t log on to them unless needed. You could specify another, but then it needs to be a server where you have access to and it copies the data over the network, which is slower. With 10Gbps or more that’s not an issue but with 1Gbps, it can take a while with a lot of large files. Can we fix this? Yes. with Mount to Console!

Mount to Console to the rescue

But hey, this would not be Veeam if they did not have yet another option on their sleeve! You can use the Mount to Console button on the ribbon of the Veeam Backup browser to create an extra mount point on your VBR console server. From here you can copy the files to the restore file share in the protected environment a lot faster as well. All you need for this is a local user (non-administrator account) with the Veeam Restore Operator role.

File-Level Restore in a hardened environment
Mount to console to the rescue
File-Level Restore in a hardened environment
Et voila, you can browse the files even if you don’t have space on your console server

See https://helpcenter.veeam.com/docs/backup/hyperv/guest_restore_save_hv.html?ver=100 for more information on this.

The correct answer is Veeam Enterprise Manager

Veeam Enterprise Manager addresses the challenges we worked around here. But you’ll need Enterprise (plus) to get full-featured use of that. It is nice as it takes care of the above scenario and you can have restore rights assigned to people without compromising your backup environments operational security. So, yes, use Enterprise Manager when possible. If not, see the above workarounds for other options.

Conclusion

Having some insights into how Veeam Backup & Replication works can be in handy at times. It also helps if you can think a bit outside the box and act on those insights to come up with some alternatives or workarounds. This is exactly what we did here with great results. The DBA could do his restores faster while the devs have the version of the database they needed. Do note that the correct answer lies in Veeam Enterprise manager, but lacking that. You now have some options. I hope these insights into a File-Level Restore in a hardened environment help you out someday.

Troubleshooting 100% stalled Veeam backup jobs

Introduction

Recently I got to diagnose a really interesting Veeam Backup & Replication symptom. Imagine you have a backup environment that runs smoothly. All week long but then, suddenly, running backup jobs stall. News jobs that start do not make an ounce of progress. It is as the state of every job is frozen in time. Let’s investigate and dive into troubleshooting 100% stalled Veeam backup jobs.

That morning, the backup jobs had not made an ounce of progress since the night before and they never will, you can leave this for days sometimes a job in between does seem to work properly, but most often not so the job queue builds up.

Troubleshooting 100% stalled Veeam backup jobs

When looking at the stalled jobs, nothing in the Veeam GUI indicates an error. Looking at the Windows event logs we see no warning, error, or critical messages. All seems fine. As this Veeam environment uses ReFS on storage spaces we are a bit weary. While the bugs that caused slowdowns have been fixed, we are still alert to potential issues. The difference with the know (fixed) ReFS issues that this is no slowdown, No sir, the Veeam backup jobs have literally frozen in time but everything seems to be functional otherwise.

Another symptom of this issue is that the synthetic full backups complete perfectly well, but they finfish with an error message none the less due to a time out. This has no effect on the synthetic backup result (they are usable) but it is disconcerting to see an issue with this.

On top of that, data copies into the ReFS volumes work just fine and at an excellent speed. Via performance monitor, we can see that the rotation of full regions from mirror to parity is also working well once the mirror tier has reached a specified capacity level.

Time to dive into the Veeam logs I would say.

Veeam backup job log

So the next stop is the Veeam logs themselves. While those can seem a little intimidating, they are very useful to scroll through. And sure enough, we find the following in one of the stalled jobs its backup log.

This goes on all through the night …

For hours on end … it goes on that way.

VIRTUAL MACHINE TASK LOGS 1

When we look at the task log of ar virtual machine that is still at 0% we see the same reflected there. Note that nothing happens between 22:465 and 05:30, that’s when I disabled and enabled the vNIC of the preferred networks in the VBR virtual machine and it all sprung back to life.

notice the total stand still form 22:46 to 05:30 …

So it is clear we have a network issue of some sort. We checked the repository servers and the Hyper-V cluster but there everything is just fine. So where is it?

Virtual Machine task logs 2

We dive into the task log of one of the virtual machines who’s backing up and that is hanging at 88%. There we see one after the other reconnect to the repository IP (over the preferred network as defined in VB). That also happens all night long until we reset the VBR virtual machine’s preferred network vNICs. In the log snippet below notice the following:

Error    A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond (System.Net.Sockets.SocketException)

From the logs we deduct that the network error appears to be on the VBR virtual machine itself. This is confirmed by the fact that bouncing the vNICs of the preferred network (10.10.110.x is the preferred network subnet) on the VBR virtual machine kicks the jobs back into action. So what is the issue? So we start checking the network configurations and settings. The switch ports, pNICs, vNIC, vSwitches etc. to find out what’s going on, As it seems to work for days or a week before the issue shows up we suspect a jumbo frame issue so we start there.

The solution

While checking the configuration we to make sure jumbo frames are enabled on the vNIC and the pNICs of the vSwitch’s NIC team. That’s when we notice the jumbo frames are missing from those pNICs. So we set those again.

From the VBR virtual machine we run some ping tests. The default works fine.

Troubleshooting 100% stalled Veeam backup jobs

When we test with jumbo frames however we notice something. The ping tests do not complain about jumbo frames being too large and that with the “do not fragment” option set the “Packet needs to be fragmented but DF set.” Note it just says “request timed out”. This indicates an issue right here, jumbo frames are set but they do not work.

Troubleshooting 100% stalled Veeam backup jobs
So the requests time out, it does not complain about the jumbo frames … so we have another issue here than just the jumbo frame settings

As the requests time out and the ping test does not complain about the jumbo frames we have another issue here than just the jumbo frame settings. It smells of a firmware and/or driver issue. So we dive a bit further. That’s when I notice the driver for the relevant pNICs (Broadcom) is the inbox Windows driver. That’s no good. The inbox drivers only exists to be able to go out and fetch the vendor’s driver and firmware when need, as a courtesy so to speak. We copy those to the hosts that require an update. In this case, the nodes where the VBR virtual machine can run. The firmware update requires a reboot. When the host is up and the VBR virtual machine is running I test again.

Bingo, now a ping test succeeds.

Troubleshooting 100% stalled Veeam backup jobs
Success

What happened?

So did we really forget to update the drivers? Did we walk out of the offices to go in lockdown for the Corona crisis and forget about it? In the end, it turned out they did run the updates for the physical hosts. But for some reason, the Broadcom firmware and drivers did not get updated properly. However, that failed update seems to have also removed the Jumbo frame settings from the pNICs that are used for the virtual switch. After fixed both of these we have not seen the issue return.

Remarks

The preferred networks do not absolutely have to be present on the VBR server itself. Define, yes, present, no. But it speeds up backup job initialization a lot when they are there present on the VBR server and Veeam also indicates to do so in their documentation.

Why jumbo frames? Ah well the networks we use for the preferred networks are end to end jumbo frame enabled. So we maintain this in to the VBR server. We might get away by not setting jumbo frames on the VBR server but we want to be consistent.

Conclusion

It pays to make sure you have all settings correctly configured and are running the latest and greatest known good firmware. But that should have been the case here. And it all worked so well for quite a while before the backup jobs stall. The issue can lie in the details and sometimes things are not what you assume they are. Always verify and verify it again.

I hope this helps someone out there if they are ever troubleshooting 100% stalled Veeam backup jobs If you need help, reach out in the comments. There are a lot of very experienced and respected people around in my network that can help. Maybe even I can lend a hand and learn something along the way.

Veeam Backup & Replication Preferred Subnet & SMB Multichannel

Introduction

In a previous blog post Veeam Backup & Replication leverages SMB Multichannel post we showed that Veeam backup & Replication leverages SMB multichannel when possible.

But what about Veeam Backup & Replication Preferred Subnet & SMB Multichannel, does that work? We mentioned that we wanted to answer the question what happens if we configure a preferred back-up network in Veeam Backup & Replication. Would this affect the operation of SMB multichannel at all? By that I means, would enabling a preferred network in Veeam prevent multichannel from using more than one NIC?

In this blog post we dive in to that question and some scenarios. We actually need to be able to deal with multiple scenarios. When you have equally capable NICs that are on different subnets you might want to make sure it uses only one. Likewise, you want both to be used whether they are or are not on the same subnet even if you set a preferred subnet in Veeam. The good news is that the nature of SMB Multichannel and how Veaam preferred networks work do allow for flexibility to achieve this. But it might not work like you would expect, unless you understand SMB Multichannel.

Veeam Backup & Replication Preferred Subnet & SMB Multichannel

For this blog post we adapt our lab networking a bit so that our non-management 10Gbps rNICs are on different subnets. We have subnet 10.10.110.0/24 for one set of NICs and 10.10.120.0/24 for the second set of NICs. This is shown in the figure below.

image

These networks can live in a separate VLAN or not, that doesn’t really matter. It does matter if to have a tagged VLAN or VLANS if you want to use RDMA because you need it to have the priority set.

We now need to configure our preferred network in Veeam Backup & Replication. We go to the main menu and select Network Traffic Rules

clip_image004

In the Global Network Traffic Rules window, click Networks.

clip_image006

In the Preferred Networks window, select the Prefer the following networks for backup and replication traffic check box.

clip_image008

Click Add. We use the CIDR notation to fill out our preferred network or you can use the network mask and click OK.

To prove a point in regards to how Multichannel works isn’t affected by what you fill out here we add only one of our two subnets here. SMB will see where it can leverage SMB multichannel and it will kick in. Veeam isn’t blocking any of its logic.

So now we kick of a backup of our Hyper-V host to our SMB hare target backup repository. We can see multichannel work just fine.

clip_image010

Below is a screenshot on the backup target of the backup running over SMB multichannel, leveraging 2 subnets, while having set only one of those as the preferred network in Veeam Backup & Replication

clip_image012

Look at my backup fly … and this is only one host that’s being backup (4 VMs actually). Have I told you how much I love flash storage? And why I’m so interested in getting ReFS hybrid volumes with SSD/SATA disks to work as backup target? I bet you do!

Looking good and it’s easy, right? Well not so fast!

Veeam does not control SMB Multichannel

Before you think you’re golden here and in control via Veeam lets do another demo. In the preferred network, we enter a subnet available to both the source and the target server but that is an LBFO (teamed) NIC with to 1Gbps members (RSS is enabled).

clip_image014

No let’s see what happens when we kick of a backup.

clip_image016

Well SMB multichannel just goes through its rules and decided to take the two best, equally capable NICs. These are still our two 10Gbps rNICs. Whatever you put in the preferred network is ignored.

This is neither good or bad but you need to be aware of this in order to arrange for backups to leverage the network path(s) you had in mind. This is to avoid surprises. The way to do that the same as you plan and design for all SMB multichannel traffic.

As stated in the previous blog post you can control what NICs SMB multichannel will use by designing around the NIC capabilities or if needed disabling or enabling some of these or by disabling SMB multichannel on a NIC. This isn’t always possible or can lead to issues for other workloads so the easiest way to go is using SMB Multichannel Constraints. Do note however that you need to take into consideration what other workloads on your server leverage SMB Multichannel when you go that route to avoid possible issues.

As an example, I disabled multichannel on my hosts. Awful idea but it’s to prove point. And still with our 10.10.0.0/16 subnet set as preferred subnet I ran a backup again.

clip_image018

As you can see the 2*1Gbps LBFO NIC is doing all the lifting on both hosts as it’s switch independent and not LACP load balancing mode we’re limited to 1Gbps.

So how do we control the NICs used with SMB multichannel?

Well SMB Multichannel rules apply. You use your physical design, the capabilities of the NICs and SMB constraints. In reality you’re better off using your design and if needed SMB multichannel constraints to limit SMB to the NICs you want it to use. Do not that disabling SMB Multichannel (client and or server) is a global for the host. Consider this as it affects all NICs on the host, not just the ones you have in mind for your backups. In most cases these NICs will be the same. Messing around with disabling multichannel or NIC capabilities (RSS, RDMA) isn’t a great solution. But it’s good to know the options and behavior.

Some things to note

Realize you don’t even have to set both subnets in the preferred subnets if they are different. SMB kicks of over one, sees it can leverage both and just does so. The only thing you manipulated here SMB multichannel wise is which subnet is used first.

If both of our rNICs would have been on the same subnet you would not even have manipulated this.

Another thing that’s worth pointing out that this doesn’t require your Veeam Backup & Replication VM to have an IP address in any of the SMB multichannel subnets. So as long as the source Hyper-V hosts and the backup target are connected you’re good to go.

Last but not least, and already mentioned in the previous blog post, this also leverages RDMA capabilities when available to help you get the best throughput, lowest latency and leave those CPU cycles for other needs. Scalability baby! No I realize that you might think that the CPU offload benefit is not a huge deal on your Hyper-V host but consider the backup target being hammered by several simultaneous backups. And also consider that some people their virtual machines look like below in regards to CPU usage, in ever more need of more vCPU and CPU time slices.

clip_image020

And that’s what the Hyper-V host looks like during a backup without SMB Direct (with idle VMs mind you).

clip_image022

All I’m saying here is don’t dismiss RDMA too fast, everything you can leverage to help out and that is available for free in the box is worth considering.

Note: I have gotten the feedback that Veeam doesn’t support SMB Direct and that this was confirmed by Veeam Support. Well, Veeam Backup & Replication leverages SMB 3 but that’s an OS feature. Veeam Backup & Replication will work with SMB Multichannel, Direct, Signing, Transparent Failover … It’s out of the Veeam Backup & Replication scope of responsibilities as we have seen here. You feel free to leverage SMB Direct whether that is using iWarp/Roce or Infiniband. This information was confirmed by Veeam and bears the “Anton Gostev seal of approval”. So if SMB Direct cause issues you have a configuration problem with that feature, it’s not Veeam not being able to support it, it doesn’t know or care actually.

Conclusion

The elegance and simplicity of the Veeam Backup & Replication GUI are deceiving. Veeam is extremely powerful and is surprisingly flexible in how you can leverage and configure it. I hope both my previous blog post and this one have given you some food for thought and ideas. There’s more Veeam goodness to come in the coming months when times allows. Many years ago, when SMB 3 was introduced I demonstrated the high availability capabilities this offered for Veeam backups. I’ll be writing about that in another blog post.