Cosmetic Issue on DELL PowerEdge servers with iDRAC firmware 2.52.52.52

Just a head up to people who might notice the following on their DELL PowerEdge servers after updating to iDRAC firmware 2.52.52.52. I have seen it on DELL generation 12 and 13 servers (R720/730) myself.  I noticed 4 Mas Storage Function devices under “Other devices” after the installation. Before a reboot there are 4.

image

While not needed for the iDRAC firmware upgrade I did try a reboot. It is still there after a reboot, but we’re down to 2 now.

image

The device instant path and other properties show for both of these

USB \ VID_0624 & PID_0251 \ 20120731-1
USB \ VID_0624 & PID_0252 \ 20120731-2

and the USB and ID points to iDRAC remote virtual devices. Dell support confirmed this is a benign cosmetic bug with not performance or stability issues. It should be resolved in the next firm ware upgrade for the iDRAC.

Does the DELL VRTX Support Storage Spaces anno 2018?

Some one asked on my blog if the DELL VRTX supported Storage Spaces. It’s 2018 and when I wrote about the VRTX it was mainly as a Cluster in a Box (CiB) solution. This is based on a shared SAS raid controller. The addition of a second controller improved the redundancy (past the write-through requirement as we had in 2014) even though I would really like to see a native in bow redundant network solution here as well. Whether this is suitable for your need is something only you can determine.

Bus as far as a support for Microsoft Shared Storage Spaces or Storage Spaces goes that isn’t there and I would advise against it. A storage controller configuration (pass-through) for the DELL Technologies VRTX series that supported any form of Storage Spaces never came. While with 2 Nodes and the VRTX supporting two storage controller this would theoretically be possible. But with 3 or 4 nodes (The VRXT supports up to 4 nodes) that’s another challenge.

While I have liked the idea and suggested it even as a possible path it has never materialized. If S2D, especially in combination with ReFSv3 or beyond, becomes so immensely popular, they might consider it, but for now it’s not something I see happen and they might very choose other offerings to serve that demand anyway, one with a better design for the separate pass-through capable storage controllers.

As a cluster in a box solution the VRTX does hold merit. As said, I’d love to see a few improvements made to make it fully redundant all in box. With a ruggedized version for industrial or highly mobile environments could make an unbeatable offering.

DISCLAIMER: I don’t work for DELL, I don’t get paid by DELL, I don’t speak for DELL. This is my current independent opinion.

Replay Manager 7.8 and cluster OS rolling upgrade Tips

Compellent Replay manager 7.8  Windows Server 2016 Clusters in mixed mode or at cluster functional lever 8

Consider this a a quick publish about tips for when you combine Replay Manager 7.8, Compellent and Windows Server 2016. Many of you will be doing cluster operating system rolling upgrade of your Windows Server 2012 R2 clusters to Windows Server 2016. If you have done your homework and made sure your hardware is supported you can still run into a surprise. As long as your in mixed mode (Wi2K12R2 mixed with W2K16 nodes) or have not updated the cluster functional level to 9 (Windows Server 2016) you will have a few issues.

In Replay Manager 7.8  itself you’ll notice that the nodes of your cluster only see the CSV LUNs under local volumes that they are the owner of currently. Normally you’ll see all of the CSV LUNs of the (Hyper-V) cluster on all of the nodes of that cluster. So that’s not the expected behavior. This leads to failed  restore points when you run a snapshot from a host that is not the owner of the CSV etc.

image

On top of that when you try to run a backup job it will fail. The reason given is:

The requested volumes is not supported because it is not managed by the provider, is a dynamic volume, or it has some other incompatibility with the current operation.

The fix? Just update your upgrade cluster to cluster functional level  (level 9)

It’s as easy as that. The moment you upgrade your cluster functional level to 9 you will see all the CSV on the cluster on every node of that cluster you connect to. At that moment the replays will also work. That’s OK, you want to move swiftly trough the rolling upgrade and once you’re comfortable all drivers and firmware are working fine. You do not want to be in a the lower cluster version too long, but upgrade to benefit from the new capabilities in Windows Server 2016 Failover clustering. You do need to know this when you start your upgrades

image

Close your backups apps, restart the Replay manager service on the cluster nodes, refresh / reconnect to the backup apps, and voila. You’ll see the image you are use to in Replay Manager 7.8 (green text / arrows) and the backup jobs will work as well as any other backup product using the Compellent Replay Manager 7.8 hardware VSS provider.image

I hope this helps some of you out there. So yes Replay Manager 7.8 supports Windows Server 2016 Clusters with CSV LUNs but if you upgraded your cluster via cluster operating system rolling upgrade you need to have upgraded your cluster functional level! Until then, Replay Manager 7.8 isn’t going to work very well.

So there you go, that’s another reason to move through that process fast and smooth as you can.

Still missing in action for Hyper-V with Replay Manager 7.8

I’d really like for Replay Manager to be a bit more cluster friendly. No matter what node you are connected to they show you all CSV LUNs in the cluster. Since Replay manager 7.8 with Windows Server 2016 when you run a job manually you must start it when connected to the cluster node that owns the CSV or the job will fail with “No resources found on current cluster node for backup set”.

image

This was not the case with Windows Server 2012(R2) and earlier versions of Replay Manager. That did throw some benign errors in the event logs on the cluster node but it did work. I would love for DELLEMC to make sure the Replay Manager Client is smart enough to detect who owns the CSV and make sure it’ starts the job from that node. That would be a lot more user friendly. At the very least it should indicate which of the CSV LUNs you see are owned by the cluster node you are connected to.But when launching a backup job for a CSV that’s not owned by the node you are connect to the job quits/fails. They can detect the node they need, launch the job on that node and show it to you. That avoids having to go find out yourself what cluster node to connect to in Replay manager when you need to run a out of schedule job manually? The tech/logic is already there as the scheduled jobs get launched on the correct node.

It would also be great if they finally could get the logic built into Replay manager for the Hyper-V VM backups to know on what CSV and Hyper-V node the VM lives and deal with that. Sure it might cause more more snapshots to be made but that’s an invalid argument. When the VMs are on the same node,but different  CSV’s that’s already happening. Really on VM per job to avoid this isn’t a great answer.

Dell iDRAC 6 Remote Console Connection Failed

Dell iDRAC 6 Remote Console Connection Failed

I recently had the honor to fix a real annoying issue with the iDRAC on rather old DELL hardware, R710 servers that are stilling puling their weight. They have been upgraded to the latest firmware naturally and DELL allows access to those updates to anyone without the need for a support contract (happy users/customers).You can perfectly configure Java site exceptions and use Firefox or Chrome to connect to it (IE is different story, you can connect but the view is messed up). Anyway the browser isn’t the big issue. The  problem was that Dell iDRAC 6 remote console connection failed consistently at the very last moment with “Connection Failed”

image

image

Note: are you nuts?

Yes I like 25/50/100Gbps RDMA, S2D, All Flash etc. I do live the vanguard live on the bleeding edge, but part of that is funding solutions that fit the environment. In this case. They have multiple spare servers and extra disks on top the ones they use in the lab or even in production. So even when a server or a component fails they can use that to fix it. They have the hands on and savvy staff members to do that. No problem. This is not an organization driven by fear of risk and responsibility but by results and effective TCO/ROI. They know very well what they can handle and what not. On top of that they know very well what part of IT sectors sales and marketing promises/predictions are FUD and which are reality. This means they can make decisions based on optimizing for their needs delivering real results.

Leveraging old hardware does mean that sometimes you’ll  run into silly issues but annoying issues like older DRAC cards with modern client operating systems, browsers and recent Java versions.

Most tricks are to be found on line to get those to work together but sometimes even those fails. First of all make sure all network requirements are in order (ports, firewall etc) and on top of that:

  • Upgraded the DRAC Firmware to the latest v2.85
  • Add DRAC IP into the Java Exception List.
  • Change Java Network Setting from Browser to Direct Connect
  • Hack the Java config files
  • Disable Encrypted Video on the DRAC
  • Reset the DRAC
  • On top of this you can run and older version of the browser and Java but at a certain point this becomes a silly option. You see at a given moment the entire stack as moved ahead and one trick like running an old version of Java won’t do it anymore and keeping a VM around that’s at a 10 year old tech/version level is a pain.

The missing piece for me: generate & upload SHA256 certs

So let me share you what extra step got the remote console of the DELL R710 iDRAC to work with the most recent version of Java, Windows 10 and the latest of the greatest Firefox browser at the time of writing.

The trick that finally did it is to generate a CSR on the DRAC while you are connected to it. You see, many people never upload their own certs and if they did, it might have been many years ago. Those old SHA1 certs are frowned upon by modern browsers and Java.

image

image

Open the CSR file, copy the content and submit it to a PKI you have or a free one on line like at getacert.com. Just fill out some random info in the request and you’ll get a SHA256 cert for download immediately that “valid” a couple of months. Enough for testing or getting out of a pickle. Your own corporate CA will do better for long term needs.

image

On top of that you’ll need to reset the DRAC card and give it a few minutes.

image

Reconnect to the DRAC and after that, without failure, we could connect to the on all R710 servers where before we kept getting the dreaded “Connection Failed” error otherwise.

That’s it! Good luck.