Make Veeam Instant Recovery use a preferred network

Posted on April 24, 2020 by workinghardinit

Introduction

In this post, I will be discussing an issue we ran into when leveraging the instant recovery capability of Veeam Backup & Replication (VBR). The issue became apparent when we set up the preferred networks in VBR. The backup jobs and the standard restores both leveraged the preferred network as expected. We ran into an issue with instant recovery. While the mount phase leverages the preferred network this is not the case during the restore phase. That uses the default host management network. To make Veeam instant recovery use a preferred network we had to do some investigation and tweaking. This is what this blog post is all about.

Overview

We have a Hyper-V cluster, shared storage (FC), that acts as our source. We back up to a Scale Out Backup Repository that exists of several extend or standard repository. Next to the management network, all of the target and source nodes have connectivity to one or more 10/25Gbps networks. This is leveraged for CSV, live migrations, storage replication, etc. but also for the backup traffic via the Veeam Backup & Replication preferred network settings.

The IPs for the preferred networks are NOT registered in DNS. Note that the Veeam Backup & Replication server also has connectivity to the preferred networks. The reason for this is described in Optimize the Veeam preferred networks backup initialization speed.

As you might have guessed from the blog post title “Make Veeam Instant Recovery use a preferred network” this all worked pretty much as expected for the backup themselves and standard restores. But when it came to Instant Recovery we noticed that while the mount phase leveraged the preferred network, the actual restore phase did not.

To read up on instant recovery go to Instant VM Recovery. But in this blog post, it is time to dive into the log files to figure out what is going on?

To solve the issue we dive into the VBR logs, but also into the logs on both the repository/extent and the Hyper-V server where we restore the VM to. The logs confirmed what we already noticed. For backups and normal restores, it correctly decides to use the preferred network. With instant recovery for some reason, in the restore phase, it selects the default host network which is 1Gbps.

Investigating the logs

Reading logs can seem an intimidating tedious task. The trick is to search for relevant entries and that is something you learn by doing. Combine that with an understanding of the problem and some common sense and you can quickly find what you need to look for. Than it is key to figure out why this could be happening. Sometimes that doesn’t work out. In that case, you contact Veeam support. That’s what I did as I knew well what the issue was and I could see this reflected in the logs. But I did not know how to handle this one.

We will look at the logs on the VBR Server, the repository where the backup files of the VM live, and the Hyper-V node where we restore the VM to investigate this issue.

The VBR log

Let’s look at the restore log of the virtual machine for which we perform instant recovery on the VBR server. We notice the following.

The actual restore phase of Instant Recovery leveraging the 1Gbps default host management network

The repository logs

These are the logs of our repository or extent where the restore reads the backup data from. There are actually 2 logs. One is the mount log and the other is the restore log.

We first dive into the Agent.IR.DidierTest08.Mount.Backup-Side.log of our test VM instant recovery. Here we can see connections to our Hyper-V server node where instant recover the test VM over the preferred network. Note that is is the Hyper-V server node that acts as the client!

Let ‘s now parse the Agent.IR.DidierTest08.Restore.Backup-.Side.log of our test VM instant recovery. No matter how hard we look we cannot find any connection attempt, let alone a connection to a preferred network (10.10.110.0/24). We do see the restore work over the default management network (10.18.0.0/16). Also note here that it is the repository node that connects to the Hyper-V node (10.18.230.5), it acts as a client now.

The restore target log

This is the Hyper-V server to where we restore the virtual machine. There are multiple logs but we are most interested in the mount log and the restore log.

We first dive into the Agent.IR.DidierTest08.Mount.HyperV-Side.log of our test VM instant recovery.The mount log shows what we already know. It also shows that it is the Hyper-V server that initiates the connection. This does leverage the preferred network (10.10.110.0/24).

It also shows the mount phase does leverage the preferred network (10.10.110.0/24).

But when we look at the restore phase log Agent.IR.DidierTest08.Restore.HyperV-Side.log we again see that the default host management network is used instead of the preferred network.

Again, we see that the Hyper-V server node during the restore phase acts as the server while the repository is the client (10.18.217.5).

Summary of our findings

Based on our observations on the servers (networks used) and investigating the logs we conclude the following. During an instant recovery, the VM is mounted on the Hyper-V host (where the checkpoint is taken). During the mount phase, the Hyper-V host acts as the client, while the repository acts as the client. This leverages the preferred network. Now, during the restore phase, the repository acts as the client and connects to the Hyper-V host that acts as the server. This does not leverage the preferred network.

This indicates that the solution might lie in reversing the client/server direction for the restore phase of Instant Recovery. But how? Well, there is a setting in Veeam where we can do just this.

Make Veeam Instant Recovery use a preferred network

I have to thank the Veeam support engineer that worked on this with me. He investigated the logs that I sent him as well but with more insight than I have. Those were clean logs just showing reproductions of the issue in combination with a Camtasia Video of it all. That way I showed him what was happening and what I saw while he also had the matching logs to what he was looking at.

Sure enough, he came back with a fix or workaround if you like. To make sure instant recovery leverages the preferred network we needed to do the following. On each extent, in properties, under credentials go to network settings and check “Run server on this side” under “Preferred TCP connection role”.

The “normal” use cases are for example when most the repository FQDN resolves into several IP addresses and Hyper-V FQDN is resolved into 1 only. This was not the case in our setup, the preferred networks are not registered in DNS. But here leveraging the capability to set “Run server on this side” solves our issue as well.

Parse the logs with “Run server on this side” enabled on the repository/extents

When we start a clean test and rerun an Instant Recovery of our test VM we now see that the restore phase does leverage the preferred network. The “Run server on this side” setting is also reflected in the restore phase logs on both the repository and the backup server.

Agent.IR.DidierTest08.Restore.Backup-Side.log. The Hyper-V server is the client (10.10.110.211) connects to the server, which is the repository.

Agent.IR.DidierTest08.Restore.HyperV-Side.log. The server is now indeed the repository (10.10.110.14) and the client is the Hyper-V server node.

In the VBR log itself, we notice the “Run server on this side” has indeed been enabled.

Host ‘REPOSITORYSERVER’ should be server, reversing connection
IR.DidierTest08.Mount.log

In the Agent.IR.DidierTest08.Restore.Backup-Side.log on the repository server, we also see this setting reflected.

Based on the documentation about “Run server on this side” in https://helpcenter.veeam.com/docs/backup/hyperv/hv_server_credentials.html?ver=95u4 you would assume this is only needed in scenarios where NAT is in play. But this doesn’t cover all use cases. Enabling this checkbox on a server means it does not initiate the connection but waits for the incoming connection from its partner. In our case that also causes the preferred network to be picked up. Apparently, all that is needed is to make sure the Hyper-V hosts to where we restore act as the client and initializes the connection to the server, our repository or extents in SOBR.

Conclusion

We achieved a successful result. Our instant recoveries now also leverage the preferred network. In this use case, this is really important as multiple concurrent instant recoveries are part of the recovery plan. That’s why we have performant storage solutions for our backup and source in combination with high bandwidth on a capable network. In the end, it all worked out well with a minor tweak to make Veeam Instant Recovery use a preferred network. This was however unexpected. I hope that Veeam dives into this issue and sees if the logic can be improved in future updates to make this tweak unnecessary. If I ever hear any feedback on this I will let you know.

Protecting your Veeam Backup and Replication Server is critical

Posted on April 15, 2020 by workinghardinit

Introduction

In this blog, we will demonstrate one of the things that can go wrong when someone gets a hold of your Veeam Backup & Replication server administrative credentials. They can do more than “just” delete all your backups, replicas, etc. When they can logon to the Veeam Backup & Replication Server itself they can also grab all the credentials form the Veeam configuration database. Those credentials normally have privileges that you do not want to fall into the wrong hands. These are quite literally the keys to the kingdom. Hence, protecting your Veeam Backup & Replication Server is critical.

Protecting your Veeam Backup & Replication Server is critical

Security is not about one feature, technology or action. It takes a more holistic approach. It requires physical security to start with. You also need to adhere to the principles of least privilege rigorously. All this while locking down access, reducing the attack surface, leveraging segmentation, etc.

A key element lies in prevention. You must avoid the harvesting of those credentials. For this reason, you absolutely must practice privileged credential hygiene. Today you also want to leverage multi-factor authentication in order to protect access even better. All this, and more, prevents unauthorized access in the first place. Even when one measure fails. Read Veeam Backup & Replication 9.5 Update 3 — Infrastructure Hardening for more details on this.

Protecting your Veeam Backup & Replication Server is critical — Add MFA to portect your credentials being abused when compromised

Veeam Backup & Replication itself requires credentials to do its work of protecting data and workloads. Access to servers, proxies, repositories, interaction with virtual machines, etc. cannot happen without such credentials. Veeam encrypts the passwords of these users via strong encryption. They use the Microsoft CryptoAPI (FIPS certified) with the machine-specific encryption key for this.

As a side note, you might have seen the big fuss around the critical vulnerability in January 2020 regarding CryptoAPI. This is a reminder of why you need to keep your systems patched.

It ensures decryption of those passwords on another host than the one were encrypting them happened, fails. This means that even if someone steals the configuration database, or in some shape, way or form gets a hold of the encrypted password in the database they cannot be decrypted. This is an industry-standard and quite safe. What you need to know is that when someone gains access to your machine with local administrative rights, all bets are off.

What can happen?

The moment an attacker logs on to the Veeam Backup & Replication server with administrative rights, it is game over.

They will be able to grant themselves access to SQL Server and query it for the credentials. With that information, all they need to do is load and use a Veeam DLL to decrypt them. When this runs on the server where you encrypted them, this will succeed. If anyone would get hold of the encrypted password and tries to decrypt them on another host this will fail as that host has the wrong machine-specific key.

Let me emphasize once more that this is not a insecure implementation by Veeam. When you store encrypted passwords for a service, that service must be able to decrypt them. Otherwise, they can never use them. You cannot get the passwords via the GUI or the Veeam PowerShell commands. But via code, this is quite possible.

Sample code to proof that protecting your Veeam Backup & Replication Server is critical

I assembled a little PowerShell script that grabs the data from the Veeam configuration database. For this purpose, I filter out the passwords that have an empty string. We loop through the ones that remain and decrypt the passwords. In the end, I decided not to post the script as it might help people with bad intentions. I know it won’t stop bad actors cold in their tracks and maybe I will update this post later. But for now, did not include it.

$VeeamSQLServerInstance = "VBR\VEEAMSQL2016"

#Load the Veeam.Backup.Common.dll. This implements the encryption and decryption of passwords via the
Add-Type -path "C:\Program Files\Veeam\Backup and Replication\Backup\Veeam.Backup.Common.dll"

#Easy PowerShell to grab the contents of the Veeam database table [VeeamBackup].[dbo].[Credentials]
$AllVeeamCredentials = Invoke-Sqlcmd -ServerInstance $VeeamSQLServerInstance -Database "VeeamBackup" -Query "SELECT * FROM Credentials"

#Filter the credentials that don't have an empty string for a password.
$VeeamCredentialsWithPassword =  $AllVeeamCredentials | Where-Object {$_.Password -ne ""}
#endregion

<#
Loop through all credentials and decrypt the password. Note that this only works on the server where
the password was encrypted in the first place, i.e. the Veeam Backup & Replication server.
This is because Veeam leverages the Microsoft CryptoAPI to encrypt passwords using the
machine-specific encryption key. This is secure (FIPS compliant) but if people get root,
it is always game over
#>

ForEach ($Credential in $VeeamCredentialsWithPassword | sort user_name)
{
write-host -ForegroundColor Green "Found user:" $Credential.user_name "with description:" $Credential.user_name
$encryptedpassword = $Credential.Password
write-host -ForegroundColor Yellow  "The encrypted password is:" $encryptedpassword 
$decryptedpassword = [Veeam.Backup.Common.ProtectedStorage]::getlocalString($encryptedpassword)

Add-Type -AssemblyName System.Security
$encrypted = [Convert]::FromBase64String($encryptedpassword)
$decrypted = [System.Security.Cryptography.ProtectedData]::Unprotect($encrypted, $null, 'LocalMachine')
[System.Text.Encoding]::UTF8.GetString($decrypted)

write-host -ForegroundColor Red "The password is:" $decryptedpassword 
}

In the screenshot below, you can see the results. This is a demo lab with demo credentials, so there’s no need to worry about sharing this with you. Remember that you can only decrypt the password on the Veeam Backup & Replication server where it was encrypted.

To prove a point, we will retrieve the encrypted passwords and attempt to decrypt them on another VBR server, providing us with the opportunity to do so. This fails with an Exception calling “GetLocalString” with “1” argument(s): “Key not valid for use in specified state. Error.
“

As you can see, even if you get a hold of the encrypted strings, they cannot be decrypted on another machine. You must do this on the machine that encrypted them.

Conclusion

While to some this might be a shock when they first learn of this, it is not a gaping security hole. It just shows you that security is more than encryption. It takes multiple measures on multiple levels to protect assets. I repeat, protecting your Veeam Backup & Replication Server is critical. For many people, this is indeed an eye-opener. The lesson is that you must protect your assets adequately. Do not rely on a single feature to ward off all threats on its own. That is asking for the impossible.

I do hope that all Veeam software itself will also support MFA in the future. That would also help protect access via the Veeam Backup & Replication console.

Veeam File Share backups and knowledge worker data

Posted on February 6, 2020 by workinghardinit

Introduction

Today I focus on Veeam File Share backups and knowledge worker data testing. In Veeam NAS and File Share Backups did my 1st testing with the RTM bits of Veeam Backup & Replication V10 File Share backup options. Those tests were focused on a pain point I encounter often in environments with lots of large files: being able to back them up at all! Some examples are medical imaging, insurance, and GIS, remote imaging (satellite images, Aerial photography, LIDAR, Mobile mapping, …).

The amount of data created has skyrocketed driven by not only need but the advances in technology. These technologies deliver ever-better quality images, are more and more affordable, and are ever more applicable in an expanding variety of business cases. This means such data is an important use case. Anyway for those use cases, things are looking good.

But what about Veeam File Share backups and knowledge worker data? Those millions of files in hundreds of thousands of folders. Well, in this blog post I share some results of testing with that data.

Veeam File Share backups and knowledge worker data

For this test we use a 2 TB volume with 1.87TB of knowledge worker data. This resides on a 2 TB LUN, formatted with NTFS and a unit allocation size of 4K.

The data consists of 2,637,652 files in almost 196,420 folders. The content is real-life accumulated data over many years. It contains a wide variety of file types such as office, text, zip, image, .pdf, .dbf, movie, etc. files of various sizes. This data was not generated artificially. All servers are Windows Server 2019. The backup repository was formatted with ReFS (64K allocation unit size).

Backup test

We back it up with the file server object from an all-flash source to an all-flash target. There is a dedicated 10Gbps backup network in the lab. As we did not have a separate spare lab node we configured the cache on local SSD on the repository. I set the backup I/O control for faster backup. We wanted to see what we could get out of this setup.

Below are the results.

Veeam File Share backups and knowledge worker data — 45:28 minutes to backup 1.87 TB of knowledge worker data. I like it.

If you look at the back-up image above you see that the source was the bottleneck. As we are going for maximum speed we are hammering the CPU cores quite a bit. The screenshot below makes this crystal clear.

We have plenty of CPU cores in the lab on our backup source and we put them to work.

The CPU core load on the backup target is far less.

This begs the question if using the file share option would not be a better choice. We can then leverage SMB Direct. This could help save CPU cycles. With SMB Multichannel we can leverage two 10Gbps NICs. So We will repeat this test with the same data. Once with a file share on a stand-alone file server and once with a high available general-purpose file share with continuous availability turned on. This will allow us to compare the File Server versus File Share approach. Continuous availability has an impact on performance and I would also like to see how profound that is with backups. But all that will be for a future blog post.

Restore test

The ability to restore data fast is paramount. It is even mission-critical in certain scenarios. Medical images needed for consultations and (surgical) procedures for example.

So we also put this to the test. Note that we chose to restore all data to a new LUN. This is to mimic the catastrophical loss of the orginal LUN and a recovery to a new one.

The restore takes longer than the backup for the same amount of data, the restore speed is typically slower for large amounts of smaller files.

Below you will find a screenshot from the task manager on both the repository as well as the file server during the restore.

The repository server from where we are restoring the data

The file server where the backup is being restored completely on a new LUN. Note the peak throughput of 6.4 Gbps.

Mind you, this varies a lot and when it hits small files the throughput slows down while the cores load rises

The file server during the restore is having to work the hardest when it has to deal with the least efficient files.

Conclusion

For now, with variable data and lots of small files, it looks that restores take 2.5 to 3 times as long as backups with office worker data. We’ll do more testing with different data. With large image files, the difference is a lot less from our early testing. For now, this gives you a first look at our results with Veeam File Share backups and knowledge worker data As always, test for your self and test multiple scenarios. Your mileage will vary and you have to do your own due diligence. These lab tests are the beginning of mine, just to get a feel for what I can achieve. If you want to learn more about Veeam Backup & Replication go here.Thank you for reading.

Hyper-V Amigos Showcast Episode 20 and 21

Posted on December 28, 2019 by workinghardinit

Introduction

This is just a quick blog post to let you know the Hyper-V Amigos have released 2 webcasts recently. These are Hyper-V Amigos Showcast Episode 20 and 21. You will find a link to the videos and a description of the content below.

Hyper-V Amigos Showcast – Episode 20

In episode 20 of the Hyper-V Amigo ShowCast, we continue our journey in the different ways in which we can use storage spaces in backup targets. In our previous “Hyper-V Amigos ShowCast (Episode 19)– Windows Server 2019 as Veeam Backup Target Part I” we looked at stand-alone or member servers with Storage Spaces. With both direct-attached storage and SMB files shares as backup targets. We also played with Multi Resilient Volumes.

For this webcast, we have one 2 node S2D cluster set up for the Hyper-V workload (Azure Stack HCI). On a second 2 node S2D cluster, we host 2 SOFS file shares. Each on their own CSV LUN. SOFS on S2D is supported for backups and archival workloads. And as it is SMB3 and we have RDMA capable NICs we can leverage RDMA (RoCE, Mellanox ConnectX-5) to benefit from CPU offloading and superb throughput at ultra-low latency.

Hyper-V Amigos Show Cast Episode 20

Some extra information

The General Purpose File Server (GPFS role) is not supported on S2D for now. You can use GPFS with shared storage and in combination with continuous availability. This performs well as a high available backup target as well. The benefit here is that this is cost-effective (Windows Server Standard licenses will do) and you get to use the shared storage of your choice. But in this show cast, we focus on the S2D scenario and we didn’t build a non-supported scenario.

You would normally expect to notice the performance impact of continuous availability when you compare the speeds with the previous episode where we used a non-high available file share (no continuous availability possible). But we have better storage in the lab for this test, the source system is usually the bottleneck and as such our results were pretty awesome.

The lab has 4 Tarox server nodes with a mix of Intel Optane DC Memory (Persistent Memory or Storage Class Memory), Intel NVMe and Intel SSD disks. For the networking, we leverage Mellanox ConnectX-5 100Gbps NICs and SN2100 100Gbps switches. Hence we both had a grin on our face just prepping this lab.

As a side note, the performance impact of continuous availability and write-through is expected. I have written about it before here. The reason why you might contemplate to use it. Next to a requirement for high availability, is due to the small but realistic data corruption risk you have with not continuously available SMB shares. The reason is that they do not provide write-through for guaranteed data persistence.

We also demonstrate the “Instant Recovery” capability of Veeam to make workloads available fast and point out the benefits.

Hyper-V Amigos Showcast – Episode 21

In episode 21 we are diving into leveraging the Veeam Agent for Windows integrated with Veeam Backup & Replication (v10 RC1) to protect our physical S2D nodes. For shops that don’t have an automated cluster node build processes set up or rely on external help to come in and do it this can be a huge time saver.

We walk through the entire process and end up doing a bare metal recovery of one of the S2D nodes. The steps include:

Setting up an Active Directory protection group for our S2D cluster.
Creating a backup job for a Windows Server, where we select failover cluster as type (Which has only the “Managed by Backup Server” as the mode).
We run a backup
After that, we create the Veeam Agent Recovery Media (the most finicky part)
Finally, we restore one of the S2D hosts completely using the bare metal recovery option

Some more information

Now we had some issues in the lab one of them suffering to a BSOD on the laptop used to make the recording and being a bit too impatient when booting from the ISO over a BMC virtual CD/DVD. Hence we had to glue some parts together and fast forward through the boring bits. We do appreciate that watching a system bot for 10 minutes doesn’t make for good infotainment. Other than that, it went fine and we were able to demonstrate the process from the beginning to the end.

As is the case with any process you should test and experiment to make sure you are familiar with the process. That makes it all a little easier and hurt a little less when the day comes you have to do it for real.

We hope the show cast helps you look into some of the capabilities and options you have with Veeam in regards to protecting any workloads. Long gone are the days that Veeam was only about protecting virtual Machines. Veeam is about protecting data where ever it lives. In VMs, physical servers, workstations, PCs, laptop, on-prem, in the cloud and Office 365. On top of that, you can restore it where ever you want to avoid lock-in and costly migration projects and tools. Check it out.

Conclusion

We will be doing more web casts on Veeam Backup & Replication v10 in 2020 as it will be generally available in Q1 as far I can guess.

But with Hyper-V Amigos Showcast Episode 20 and 21, that’s it for 2019. Enjoy the holidays during this festive season. The Hyper-V Amigos wish you a Merry X-Mas and a very happy New Year in 2020!

Working Hard In IT

My view on IT from the trenches

Category Archives: Backups

Make Veeam Instant Recovery use a preferred network

Introduction

Overview

Investigating the logs

The VBR log

The repository logs

The restore target log

Summary of our findings

Make Veeam Instant Recovery use a preferred network

Parse the logs with “Run server on this side” enabled on the repository/extents

Conclusion

Protecting your Veeam Backup and Replication Server is critical

Introduction

Protecting your Veeam Backup & Replication Server is critical

What can happen?

Sample code to proof that protecting your Veeam Backup & Replication Server is critical

Conclusion

Veeam File Share backups and knowledge worker data

Introduction

Veeam File Share backups and knowledge worker data

Backup test

Restore test

Conclusion

Hyper-V Amigos Showcast Episode 20 and 21

Introduction

Hyper-V Amigos Showcast – Episode 20

Some extra information

Hyper-V Amigos Showcast – Episode 21

Some more information

Conclusion