Veeam Hardened Repositories on DELL R740XD2 Servers

Veeam Hardened Repositories on DELL R740XD2 Servers

Recently I got the opportunity to build Veeam Hardened Repositories on DELL R740XD2 Servers. Those repositories were needed to deploy a Veeam Scaleout Backup Repository (SOBR). Yes, Linux systems leveraging XFS for Fast Clone and immutability.

Technologies used

Veeam Backup & Replication

First of all, you need Veeam Backup & Replication (VBR) v11a or later. I run VBR on Windows Server 2022 at the time of writing.

Linux operating system

Next to that, I use Ubuntu 20.04 LTS for the Veeam hardened repositories. While the release of 22.04 LTS imminent at the time of writing this, with OEM hardware support as a requirement I stick to 20.04 for now. The file system is XFS with Fast Cloning enabled.

Servers with Direct Attached Storage (DAS)

I leverage RAID 60 on Dell EMC R740-DX2 servers. The servers boot from mirrored BOSS SSDs and have a PERC H740P raid controller with 8GB of cache and 26 3.5″ HDD attached. We need to optimize for both cost and capacity, therefore we use 3.5″ drives. With disk sizes ranging from 8TB to 16TB, this brings us real-world usable storage from 145TB to 290TB with two global hot spares.

Veeam Hardened Repositories on DELL R740XD2 Servers
DELL R740XD2 (Image courtesy of DELL)

Now the RAID 60 is one big virtual disk group containing all disks bar the 2 assigned as global hot spares. This makes sure we engage all disks to help with IOPS, latency, throughput. The 8Gbps helps smooth things out. Depending on the scale of your deployment you can create one VEEAM SOBR extent per server of carve multiple VDISKS out of the available storage.

You also have the option to leverage LVM on linux if that suits your need, but in that case I do it only for volume management, dat protection comes from the physical RAID controller.

There is not right or wrong way there. Select what suits you best, but the golden rule is to keep it simple.

Networking

Networking is 1Gbps for DRAC/Host connectivity and dual 10Gbps or 25Gbps for backup traffic. The switches are Dell EMC PowerSwitch S52XXF-ON series. Awesome kit!

Veeam Hardened Repositories on DELL R740XD2 Servers
The DELL S5224F-ON as TOR swithes provide 25Gbps for the backup traffic (Image courtesy of DELL)

Extra Security

Finally, I provide some extra security. I use DUO as an MFA provider as they have an excellent pam_duo module. We often use our smartphones for MFA. But there are plenty of use cases for using security keys. For those, I like the FEITIAN biometric models such as the K27 (USB-A )and the K26 (USB-C).

BioPass FIDO2 Biometric Fingerprint USB-A SecurityKey – K27
BioPass FIDO2 Biometric Fingerprint USB-A SecurityKey – K27

The Journey of building Veeam Hardened Repositories on DELL R740XD2 Servers

Maybe you are interested in how I set up these Veeam Hardened Repositories on DELL R740XD2 Servers? If so, you might be in luck. I hope to blog about this journey for both my own reference to share the experience. This will take several blog posts and in those, I will highlight different parts of the solution. If you want to learn more about the Veeam hardened Repository I recommend you read my blog series I did last year:

Veeam Hardening Linux Repository – Part 1 | StarWind Blog (starwindsoftware.com)., Veeam Hardening Linux Repository – Part 2 | StarWind Blog (starwindsoftware.com), Veeam Hardening Linux Repository – Part 3 | StarWind Blog (starwindsoftware.com)

As I realize not all of you will get your hands on such hardware I have a PowerShell script that creates Hyper-V virtual machines to use in the lab and practice with. Those VMs emulate the DELL hardware setup.

Cosmetic Issue on DELL PowerEdge servers with iDRAC firmware 2.52.52.52

UPDATE August 17th 2018: iDRAC firmware 2.60.60.60 has been released and I can confirm this fixes the cosmetic issue mentioned in this post. It’s a significant release it seems form the read me content, also for security, so test it and deploy when ready.

image

 
Just a head up to people who might notice the following on their DELL PowerEdge servers after updating to iDRAC firmware 2.52.52.52. I have seen it on DELL generation 12 and 13 servers (R720/730) myself.  I noticed 4 Mas Storage Function devices under “Other devices” after the installation. Before a reboot there are 4.

image

While not needed for the iDRAC firmware upgrade I did try a reboot. It is still there after a reboot, but we’re down to 2 now.

image

The device instant path and other properties show for both of these

USB \ VID_0624 & PID_0251 \ 20120731-1
USB \ VID_0624 & PID_0252 \ 20120731-2

and the USB and ID points to iDRAC remote virtual devices. Dell support confirmed this is a benign cosmetic bug with not performance or stability issues. It should be resolved in the next firm ware upgrade for the iDRAC.

Testing Compellent Replay Manager 7.8

Testing Compellent Replay Manager 7.8

So today I found the Replay Manager 7.8 bits to download.image

As is was awaiting this eagerly (see Off Host Backup Jobs with Veeam and Replay Manager 7.8). So naturally, I set of my day by testing Compellent Replay Manager 7.8. I deployed in on a 2 node DELL PowerEdge Cluster with FC access to a secondary DELL Compellent running SC 6.7.30 (you need to be on 6.7).

image

The first thing I noticed is the new icon.

image

That test cluster is running Windows Server 2016 Datacenter edition and is fully patched. The functionality is much the same as it was. There is one difference and that if you launch the back upset manually of a local volume for a CSV and that CSV is not owned y the Node in which you launch it the backup is blocked.

image

This did not use to be the case. With scheduled backup sets this is not an issue, it detects the owner of the CSV and uses that.

image

Just remember when running a backup manually you nee to launch it from the CSV owner node in Replay Manager and all is fine.

image

Other than that testing has been smooth and naturally we’ll be leveraging RM 7.8 with transportable snapshots with Veeam B&R 9.5 as well.

Things to note

Replay Manager 7.8 is not backward compatible with 7.7.1 or lower so you have to have the same version on your Replay Manager management server as on the hosts you want to protect. You also have to be running SC 6.7 or higher.

Wish list

I’d love to see Replay manager become more intelligent and handle VM Mobility better. The fact that VMs are tied to the node on which the backup set is create is really not compatible with the mobility of VMs (maintenance, dynamic optimization, CSV balancing, …). A little time and effort here would go a long way.

Second. Live Volumes has gotten a lot better but we still need to choose between Replay Manager  snapshots & Live Volumes. In an ideal world that would not be the case and Replay manager would have the ability to handle this dynamically. A big ask perhaps, but it would be swell.

I just keep giving the feedback as I’m convinced this is a great SAN for Hyper-V environments and they could beat anyone by make a few more improvements.

Dell Compellent SCOS 6.7 ODX Bug Heads Up

UPDATE 3: Bad and disappointing news. After update 2 we’ve seen DELL change the CSTA (CoPilot Services Technical Alert)  on the customer website to “’will be fixed” in a future version. No according to the latest comment on this blog post that would be In Q1 2017. Basically this is unacceptable and it’s a shame to see a SAN that was one of the best when in comes to Hyper-V Support in Windows Server 2012 / 2012 R2 decline in this way. If  7.x is required for Windows Server 2016 Support this is pretty bad as it means early adopters are stuck or we’ll have to find an recommend another solution. This is not a good day for Dell storage.

UPDATE 2: As you can read in the comments below people are still having issues. Do NOT just update without checking everything.

UPDATE: This issue has been resolved in Storage Center 6.7.10 and 7.Ximage

If you have 6.7.x below 6.7.10 it’s time to think about moving to 6.7.10!

No vendor is exempt form errors, issues, mistakes and trouble with advances features and unfortunately Dell Compellent has issues with Windows Server 2012 (R2) ODX in the current release of SCOS 6.7. Bar a performance issue in a 6.4 version they had very good track record in regards to ODX, UNMAP, … so far. But no matter how good your are, bad things can happen.

DellCompellentModern

I’ve had to people who were bitten by it contact me. The issue is described below.

In SCOS 6.7 an issue has been determined when the ODX driver in Windows Server 2012 requests an Extended Copy between a source volume which is unknown to the Storage Center and a volume which is presented from the Storage Center. When this occurs the Storage Center does not respond with the correct ODX failure code. This results in the Windows Server 2012 not correctly recognizing that the source volume is unknown to the Storage Center. Without the failure code Windows will continually retry the same request which will fail. Due to the large number of failed requests, MPIO will mark the path as down. Performing ODX operations between Storage Center volumes will work and is not exposed to this issue.

You might think that this is not a problem as you might only use Compellent storage but think again. Local disks on the hosts where data is stored temporarily and external storage you use to transport data in and out of your datacenter, or copy backups to are all use cases we can encounter.  When ODX is enabled, it is by default on Windows 2012(R2), the file system will try to use it and when that fails use normal (non ODX) operations. All of this is transparent to the users. Now MPIO will mark the Compellent path as down. Ouch. I will not risk that. Any IO between an non Compellent LUN and a Compellent LUN might cause this to happen.

The only workaround for now is to disable ODX on all your hosts. To me that’s unacceptable and I will not be upgrading to 6.7 for now. We rely on ODX to gain performance benefits at both the physical and virtual layer. We even have our SMB 3 capable clients in the branch offices leverage ODX to avoid costly data copies to our clustered Transparent Failover File Servers.

When a version arrives that fix the issue I’Il testing even more elaborate than before. We’ve come to pay attention to performance issues or data corruption with many vendors, models and releases but this MPIO issue is a new one for me.