Replay Manager 7.8 and cluster OS rolling upgrade Tips

Compellent Replay manager 7.8  Windows Server 2016 Clusters in mixed mode or at cluster functional lever 8

Consider this a a quick publish about tips for when you combine Replay Manager 7.8, Compellent and Windows Server 2016. Many of you will be doing cluster operating system rolling upgrade of your Windows Server 2012 R2 clusters to Windows Server 2016. If you have done your homework and made sure your hardware is supported you can still run into a surprise. As long as your in mixed mode (Wi2K12R2 mixed with W2K16 nodes) or have not updated the cluster functional level to 9 (Windows Server 2016) you will have a few issues.

In Replay Manager 7.8  itself you’ll notice that the nodes of your cluster only see the CSV LUNs under local volumes that they are the owner of currently. Normally you’ll see all of the CSV LUNs of the (Hyper-V) cluster on all of the nodes of that cluster. So that’s not the expected behavior. This leads to failed  restore points when you run a snapshot from a host that is not the owner of the CSV etc.

image

On top of that when you try to run a backup job it will fail. The reason given is:

The requested volumes is not supported because it is not managed by the provider, is a dynamic volume, or it has some other incompatibility with the current operation.

The fix? Just update your upgrade cluster to cluster functional level  (level 9)

It’s as easy as that. The moment you upgrade your cluster functional level to 9 you will see all the CSV on the cluster on every node of that cluster you connect to. At that moment the replays will also work. That’s OK, you want to move swiftly trough the rolling upgrade and once you’re comfortable all drivers and firmware are working fine. You do not want to be in a the lower cluster version too long, but upgrade to benefit from the new capabilities in Windows Server 2016 Failover clustering. You do need to know this when you start your upgrades

image

Close your backups apps, restart the Replay manager service on the cluster nodes, refresh / reconnect to the backup apps, and voila. You’ll see the image you are use to in Replay Manager 7.8 (green text / arrows) and the backup jobs will work as well as any other backup product using the Compellent Replay Manager 7.8 hardware VSS provider.image

I hope this helps some of you out there. So yes Replay Manager 7.8 supports Windows Server 2016 Clusters with CSV LUNs but if you upgraded your cluster via cluster operating system rolling upgrade you need to have upgraded your cluster functional level! Until then, Replay Manager 7.8 isn’t going to work very well.

So there you go, that’s another reason to move through that process fast and smooth as you can.

Still missing in action for Hyper-V with Replay Manager 7.8

I’d really like for Replay Manager to be a bit more cluster friendly. No matter what node you are connected to they show you all CSV LUNs in the cluster. Since Replay manager 7.8 with Windows Server 2016 when you run a job manually you must start it when connected to the cluster node that owns the CSV or the job will fail with “No resources found on current cluster node for backup set”.

image

This was not the case with Windows Server 2012(R2) and earlier versions of Replay Manager. That did throw some benign errors in the event logs on the cluster node but it did work. I would love for DELLEMC to make sure the Replay Manager Client is smart enough to detect who owns the CSV and make sure it’ starts the job from that node. That would be a lot more user friendly. At the very least it should indicate which of the CSV LUNs you see are owned by the cluster node you are connected to.But when launching a backup job for a CSV that’s not owned by the node you are connect to the job quits/fails. They can detect the node they need, launch the job on that node and show it to you. That avoids having to go find out yourself what cluster node to connect to in Replay manager when you need to run a out of schedule job manually? The tech/logic is already there as the scheduled jobs get launched on the correct node.

It would also be great if they finally could get the logic built into Replay manager for the Hyper-V VM backups to know on what CSV and Hyper-V node the VM lives and deal with that. Sure it might cause more more snapshots to be made but that’s an invalid argument. When the VMs are on the same node,but different  CSV’s that’s already happening. Really on VM per job to avoid this isn’t a great answer.

Off Host Backup Jobs with Veeam and Replay Manager 7.8

It’s all about application consistent hardware VSS provider snapshots

I was browsing to see if I could already download Replay Manager 7.8 for our Compellent (SC) SANs. No luck yet, but I did find the release notes. There was a real gem in there on Off Host Backup Jobs with Veeam and Replay Manager 7.8. We’ll get back to that after the big deal here.

image

So what kind of goodness is in there? Well obviously there is the way too long overdue support for Windows Server 2016, including the Hyper-V role and its features. That is great news. We now have application consistent hardware VSS provider snapshots.I do not know what took them so long but they need to get with the program here. I have given this a s feedback before and again at DELL EMC World 2017 The Compellent still is one of the best “traditional” centralized storage SAN solutions out there hat punches fare above its weight. On top of that, having looked at Unity form DELL EMC, I can tell you that in my humble opinion the Compellent has no competition from it.

Off Host Backup Jobs Veeam Replay Manager 7.8

Equally interesting to me, as someone who leverages Compellent and Veeam Baclup & Replication with Off Host Proxies (I wrote FREE WHITE PAPER: Configuring a VEEAM Off Host Backup Proxy Server for backing up a Windows Server 2012 R2 Hyper-V cluster with a DELL Compellent SAN (Fiber Channel))  is the following. Under fixed Issues is found:

RMS-24 Off-host backup jobs might fail during the volume discover scan when using Veeam backup software.

I have Off host proxies with transportable snapshots working pretty smooth but it has the occasional hiccup. Maybe some of those will disappear with Replay Manager 7.8. I’m looking forward to putting that to the test and roll forward with Windows Server 2016 for those nodes where we need and want to leverage the Compellent Hardware VSS provider. When I do I’ll let you know the results.

DELL Compellent Storage Center 7.1 Certified for Windows Server 2016

When it comes to selecting storage, especially when it comes to a “traditional” SAN, you all know that price performance wise I’ve been using the DELL Compellent series with great success for many years now. It’s a very capable solution that also has some other benefits when it comes to Windows Server and Hyper-V. It has one of the better hardware VSS providers, way better than average support for ODX  and UNMAP etc. but it’s also very good at delivering fast support for new versions of Windows Server. This has allowed us to move from Windows Server 2008 R2 to 2012 and from there to Windows Server 2012 R2 very fast.

In that regards I’m very happy to see that Storage Center 7.1 is already in the catalog as certified for Windows Server 2016.

image

Customers that have up to date hardware and want to move fast to benefit from and leverage the new and improved capabilities in Windows Server 2016 Hyper-V, Clustering, Networking, …are ready to do so. Nice Smile.

Dell Compellent SCOS 6.7 ODX Bug Heads Up

UPDATE 3: Bad and disappointing news. After update 2 we’ve seen DELL change the CSTA (CoPilot Services Technical Alert)  on the customer website to “’will be fixed” in a future version. No according to the latest comment on this blog post that would be In Q1 2017. Basically this is unacceptable and it’s a shame to see a SAN that was one of the best when in comes to Hyper-V Support in Windows Server 2012 / 2012 R2 decline in this way. If  7.x is required for Windows Server 2016 Support this is pretty bad as it means early adopters are stuck or we’ll have to find an recommend another solution. This is not a good day for Dell storage.

UPDATE 2: As you can read in the comments below people are still having issues. Do NOT just update without checking everything.

UPDATE: This issue has been resolved in Storage Center 6.7.10 and 7.Ximage

If you have 6.7.x below 6.7.10 it’s time to think about moving to 6.7.10!

No vendor is exempt form errors, issues, mistakes and trouble with advances features and unfortunately Dell Compellent has issues with Windows Server 2012 (R2) ODX in the current release of SCOS 6.7. Bar a performance issue in a 6.4 version they had very good track record in regards to ODX, UNMAP, … so far. But no matter how good your are, bad things can happen.

DellCompellentModern

I’ve had to people who were bitten by it contact me. The issue is described below.

In SCOS 6.7 an issue has been determined when the ODX driver in Windows Server 2012 requests an Extended Copy between a source volume which is unknown to the Storage Center and a volume which is presented from the Storage Center. When this occurs the Storage Center does not respond with the correct ODX failure code. This results in the Windows Server 2012 not correctly recognizing that the source volume is unknown to the Storage Center. Without the failure code Windows will continually retry the same request which will fail. Due to the large number of failed requests, MPIO will mark the path as down. Performing ODX operations between Storage Center volumes will work and is not exposed to this issue.

You might think that this is not a problem as you might only use Compellent storage but think again. Local disks on the hosts where data is stored temporarily and external storage you use to transport data in and out of your datacenter, or copy backups to are all use cases we can encounter.  When ODX is enabled, it is by default on Windows 2012(R2), the file system will try to use it and when that fails use normal (non ODX) operations. All of this is transparent to the users. Now MPIO will mark the Compellent path as down. Ouch. I will not risk that. Any IO between an non Compellent LUN and a Compellent LUN might cause this to happen.

The only workaround for now is to disable ODX on all your hosts. To me that’s unacceptable and I will not be upgrading to 6.7 for now. We rely on ODX to gain performance benefits at both the physical and virtual layer. We even have our SMB 3 capable clients in the branch offices leverage ODX to avoid costly data copies to our clustered Transparent Failover File Servers.

When a version arrives that fix the issue I’Il testing even more elaborate than before. We’ve come to pay attention to performance issues or data corruption with many vendors, models and releases but this MPIO issue is a new one for me.