Replay Manager 7.8 and cluster OS rolling upgrade Tips

Compellent Replay manager 7.8  Windows Server 2016 Clusters in mixed mode or at cluster functional lever 8

Consider this a a quick publish about tips for when you combine Replay Manager 7.8, Compellent and Windows Server 2016. Many of you will be doing cluster operating system rolling upgrade of your Windows Server 2012 R2 clusters to Windows Server 2016. If you have done your homework and made sure your hardware is supported you can still run into a surprise. As long as your in mixed mode (Wi2K12R2 mixed with W2K16 nodes) or have not updated the cluster functional level to 9 (Windows Server 2016) you will have a few issues.

In Replay Manager 7.8  itself you’ll notice that the nodes of your cluster only see the CSV LUNs under local volumes that they are the owner of currently. Normally you’ll see all of the CSV LUNs of the (Hyper-V) cluster on all of the nodes of that cluster. So that’s not the expected behavior. This leads to failed  restore points when you run a snapshot from a host that is not the owner of the CSV etc.

image

On top of that when you try to run a backup job it will fail. The reason given is:

The requested volumes is not supported because it is not managed by the provider, is a dynamic volume, or it has some other incompatibility with the current operation.

The fix? Just update your upgrade cluster to cluster functional level  (level 9)

It’s as easy as that. The moment you upgrade your cluster functional level to 9 you will see all the CSV on the cluster on every node of that cluster you connect to. At that moment the replays will also work. That’s OK, you want to move swiftly trough the rolling upgrade and once you’re comfortable all drivers and firmware are working fine. You do not want to be in a the lower cluster version too long, but upgrade to benefit from the new capabilities in Windows Server 2016 Failover clustering. You do need to know this when you start your upgrades

image

Close your backups apps, restart the Replay manager service on the cluster nodes, refresh / reconnect to the backup apps, and voila. You’ll see the image you are use to in Replay Manager 7.8 (green text / arrows) and the backup jobs will work as well as any other backup product using the Compellent Replay Manager 7.8 hardware VSS provider.image

I hope this helps some of you out there. So yes Replay Manager 7.8 supports Windows Server 2016 Clusters with CSV LUNs but if you upgraded your cluster via cluster operating system rolling upgrade you need to have upgraded your cluster functional level! Until then, Replay Manager 7.8 isn’t going to work very well.

So there you go, that’s another reason to move through that process fast and smooth as you can.

Still missing in action for Hyper-V with Replay Manager 7.8

I’d really like for Replay Manager to be a bit more cluster friendly. No matter what node you are connected to they show you all CSV LUNs in the cluster. Since Replay manager 7.8 with Windows Server 2016 when you run a job manually you must start it when connected to the cluster node that owns the CSV or the job will fail with “No resources found on current cluster node for backup set”.

image

This was not the case with Windows Server 2012(R2) and earlier versions of Replay Manager. That did throw some benign errors in the event logs on the cluster node but it did work. I would love for DELLEMC to make sure the Replay Manager Client is smart enough to detect who owns the CSV and make sure it’ starts the job from that node. That would be a lot more user friendly. At the very least it should indicate which of the CSV LUNs you see are owned by the cluster node you are connected to.But when launching a backup job for a CSV that’s not owned by the node you are connect to the job quits/fails. They can detect the node they need, launch the job on that node and show it to you. That avoids having to go find out yourself what cluster node to connect to in Replay manager when you need to run a out of schedule job manually? The tech/logic is already there as the scheduled jobs get launched on the correct node.

It would also be great if they finally could get the logic built into Replay manager for the Hyper-V VM backups to know on what CSV and Hyper-V node the VM lives and deal with that. Sure it might cause more more snapshots to be made but that’s an invalid argument. When the VMs are on the same node,but different  CSV’s that’s already happening. Really on VM per job to avoid this isn’t a great answer.

Dell iDRAC 6 Remote Console Connection Failed

Dell iDRAC 6 Remote Console Connection Failed

I recently had the honor to fix a real annoying issue with the iDRAC on rather old DELL hardware, R710 servers that are stilling puling their weight. They have been upgraded to the latest firmware naturally and DELL allows access to those updates to anyone without the need for a support contract (happy users/customers).You can perfectly configure Java site exceptions and use Firefox or Chrome to connect to it (IE is different story, you can connect but the view is messed up). Anyway the browser isn’t the big issue. The  problem was that Dell iDRAC 6 remote console connection failed consistently at the very last moment with “Connection Failed”

image

image

Note: are you nuts?

Yes I like 25/50/100Gbps RDMA, S2D, All Flash etc. I do live the vanguard live on the bleeding edge, but part of that is funding solutions that fit the environment. In this case. They have multiple spare servers and extra disks on top the ones they use in the lab or even in production. So even when a server or a component fails they can use that to fix it. They have the hands on and savvy staff members to do that. No problem. This is not an organization driven by fear of risk and responsibility but by results and effective TCO/ROI. They know very well what they can handle and what not. On top of that they know very well what part of IT sectors sales and marketing promises/predictions are FUD and which are reality. This means they can make decisions based on optimizing for their needs delivering real results.

Leveraging old hardware does mean that sometimes you’ll  run into silly issues but annoying issues like older DRAC cards with modern client operating systems, browsers and recent Java versions.

Most tricks are to be found on line to get those to work together but sometimes even those fails. First of all make sure all network requirements are in order (ports, firewall etc) and on top of that:

  • Upgraded the DRAC Firmware to the latest v2.85
  • Add DRAC IP into the Java Exception List.
  • Change Java Network Setting from Browser to Direct Connect
  • Hack the Java config files
  • Disable Encrypted Video on the DRAC
  • Reset the DRAC
  • On top of this you can run and older version of the browser and Java but at a certain point this becomes a silly option. You see at a given moment the entire stack as moved ahead and one trick like running an old version of Java won’t do it anymore and keeping a VM around that’s at a 10 year old tech/version level is a pain.

The missing piece for me: generate & upload SHA256 certs

So let me share you what extra step got the remote console of the DELL R710 iDRAC to work with the most recent version of Java, Windows 10 and the latest of the greatest Firefox browser at the time of writing.

The trick that finally did it is to generate a CSR on the DRAC while you are connected to it. You see, many people never upload their own certs and if they did, it might have been many years ago. Those old SHA1 certs are frowned upon by modern browsers and Java.

image

image

Open the CSR file, copy the content and submit it to a PKI you have or a free one on line like at getacert.com. Just fill out some random info in the request and you’ll get a SHA256 cert for download immediately that “valid” a couple of months. Enough for testing or getting out of a pickle. Your own corporate CA will do better for long term needs.

image

On top of that you’ll need to reset the DRAC card and give it a few minutes.

image

Reconnect to the DRAC and after that, without failure, we could connect to the on all R710 servers where before we kept getting the dreaded “Connection Failed” error otherwise.

That’s it! Good luck.

Testing Compellent Replay Manager 7.8

Testing Compellent Replay Manager 7.8

So today I found the Replay Manager 7.8 bits to download.image

As is was awaiting this eagerly (see Off Host Backup Jobs with Veeam and Replay Manager 7.8). So naturally, I set of my day by testing Compellent Replay Manager 7.8. I deployed in on a 2 node DELL PowerEdge Cluster with FC access to a secondary DELL Compellent running SC 6.7.30 (you need to be on 6.7).

image

The first thing I noticed is the new icon.

image

That test cluster is running Windows Server 2016 Datacenter edition and is fully patched. The functionality is much the same as it was. There is one difference and that if you launch the back upset manually of a local volume for a CSV and that CSV is not owned y the Node in which you launch it the backup is blocked.

image

This did not use to be the case. With scheduled backup sets this is not an issue, it detects the owner of the CSV and uses that.

image

Just remember when running a backup manually you nee to launch it from the CSV owner node in Replay Manager and all is fine.

image

Other than that testing has been smooth and naturally we’ll be leveraging RM 7.8 with transportable snapshots with Veeam B&R 9.5 as well.

Things to note

Replay Manager 7.8 is not backward compatible with 7.7.1 or lower so you have to have the same version on your Replay Manager management server as on the hosts you want to protect. You also have to be running SC 6.7 or higher.

Wish list

I’d love to see Replay manager become more intelligent and handle VM Mobility better. The fact that VMs are tied to the node on which the backup set is create is really not compatible with the mobility of VMs (maintenance, dynamic optimization, CSV balancing, …). A little time and effort here would go a long way.

Second. Live Volumes has gotten a lot better but we still need to choose between Replay Manager  snapshots & Live Volumes. In an ideal world that would not be the case and Replay manager would have the ability to handle this dynamically. A big ask perhaps, but it would be swell.

I just keep giving the feedback as I’m convinced this is a great SAN for Hyper-V environments and they could beat anyone by make a few more improvements.

DELL EMC Ready Nodes and Storage Spaces Direct

Introduction

Unless you have been living under a rock you must have heard about Storage Spaces Direct (S2D) in Windows Server 2016, which has gone RTM in Q4 2016.

There is a lot of enthusiasm for S2D and we have seen heard and assisted in early adopter situations. But that’s a bit of pioneering with OEM/MSFT approved components. So now bring in the DELL EMC Ready Nodes and Storage Spaces Direct.

DELL EMC Storage Spaces Direct Ready Nodes

So enter the DELL EMC ready nodes. These will be available this summer and should help less adventurous but interested customers get on the S2D bandwagon. These were announced at DELL EMC world 2017 and on may 30th they published some information on the TechCenter.

If offers a fully OEM supported S2D solution to the customers that cannot or will not carry the engineering effort associated with self built solution.

I was sort of hoping these would leverage the PowerEdge 740DX from the start but they seem to have opted to begin with the DELL 730DX. I’m pretty sure the R740DX will follow soon as it’s a perfect fit for the use case having 25Gbps support. In that respect I expect a refresh of the switches offered as well as the S4048 is a great switch but keeps us at 10Gbps. If I was calling the shots I’d that ready and done sooner rather than later as the 25/50/100Gbps network era is upon us. There’s a reason I’ve been asking for 25Gpbs capable switches with smaller port counts for SME.

Maybe this is an indication of where they think this offering will sell best. But I’d be considering future deployments when evaluating network gear purchases. These have a long service time. And when S2D proves it self I’m sure the size of the deployments will grow and with it the need for more bandwidth. Mind you 10Gbps isn’t bad even if if, for Hyper-V nodes would be doing 2*dual port Mellanox Connect-X 3 Pro cards.

Having mentioned them, I am very happy to see the Mellanox RoCE cards in there.That’s the best choice they could have made. The 1Gbps on board NICs are Intel, which matches my preference. The game is a foot!