Dell iDRAC 6 Remote Console Connection Failed

Dell iDRAC 6 Remote Console Connection Failed

I recently had the honor to fix a real annoying issue with the iDRAC on rather old DELL hardware, R710 servers that are stilling puling their weight. They have been upgraded to the latest firmware naturally and DELL allows access to those updates to anyone without the need for a support contract (happy users/customers).You can perfectly configure Java site exceptions and use Firefox or Chrome to connect to it (IE is different story, you can connect but the view is messed up). Anyway the browser isn’t the big issue. The  problem was that Dell iDRAC 6 remote console connection failed consistently at the very last moment with “Connection Failed”

image

image

Note: are you nuts?

Yes I like 25/50/100Gbps RDMA, S2D, All Flash etc. I do live the vanguard live on the bleeding edge, but part of that is funding solutions that fit the environment. In this case. They have multiple spare servers and extra disks on top the ones they use in the lab or even in production. So even when a server or a component fails they can use that to fix it. They have the hands on and savvy staff members to do that. No problem. This is not an organization driven by fear of risk and responsibility but by results and effective TCO/ROI. They know very well what they can handle and what not. On top of that they know very well what part of IT sectors sales and marketing promises/predictions are FUD and which are reality. This means they can make decisions based on optimizing for their needs delivering real results.

Leveraging old hardware does mean that sometimes you’ll  run into silly issues but annoying issues like older DRAC cards with modern client operating systems, browsers and recent Java versions.

Most tricks are to be found on line to get those to work together but sometimes even those fails. First of all make sure all network requirements are in order (ports, firewall etc) and on top of that:

  • Upgraded the DRAC Firmware to the latest v2.85
  • Add DRAC IP into the Java Exception List.
  • Change Java Network Setting from Browser to Direct Connect
  • Hack the Java config files
  • Disable Encrypted Video on the DRAC
  • Reset the DRAC
  • On top of this you can run and older version of the browser and Java but at a certain point this becomes a silly option. You see at a given moment the entire stack as moved ahead and one trick like running an old version of Java won’t do it anymore and keeping a VM around that’s at a 10 year old tech/version level is a pain.

The missing piece for me: generate & upload SHA256 certs

So let me share you what extra step got the remote console of the DELL R710 iDRAC to work with the most recent version of Java, Windows 10 and the latest of the greatest Firefox browser at the time of writing.

The trick that finally did it is to generate a CSR on the DRAC while you are connected to it. You see, many people never upload their own certs and if they did, it might have been many years ago. Those old SHA1 certs are frowned upon by modern browsers and Java.

image

image

Open the CSR file, copy the content and submit it to a PKI you have or a free one on line like at getacert.com. Just fill out some random info in the request and you’ll get a SHA256 cert for download immediately that “valid” a couple of months. Enough for testing or getting out of a pickle. Your own corporate CA will do better for long term needs.

image

On top of that you’ll need to reset the DRAC card and give it a few minutes.

image

Reconnect to the DRAC and after that, without failure, we could connect to the on all R710 servers where before we kept getting the dreaded “Connection Failed” error otherwise.

That’s it! Good luck.

DELL EMC World 2017 Concludes

Today DELL EMC World 2017 ends with a dinner with DELL EMC management and engineers to discus our impressions on the information we took away from DELL EMC World 2017. I would like to thank the ever hard working Sarah Vela for making this possible. It’s much appreciated.

image

Professionally I’m blessed with multiple opportunities to attend conferences and summits. That’s where I get to talk to the skilled and passionate people who work on the technologies we work with intensively. This is very much a two way street where we learn from each other. And on many conferences I might also be a speaker or participate in advisory boards to provide feedback. Some of those latter discussions are under NDA. This is normal and I have NDA’s with other companies as well. That’s the legal side of the trust we place in each other in order to discuss evolving and future technologies.

I attend multiple events from different players. Some of these disagree with me and that is fine. We learn from being challenged. It helps us define more clearly what we design and build as well as why and how. More and more solutions become a more diverse, multi pronged combination of components with their specific capabilities at our disposal. These change fast and so do our solutions. An element not to be ignored in designing those solutions. That’s one take away from DELL EMC world that seems to have hit home. The other is that some companies are in a rather dire IT condition due to years of stand still.

I’m happy to see that today and tomorrow DELL EMC has the technologies needed for us to deliver modern IT solutions. The way in which we choose to do so is our choice and DELL EMC states it is committed to supporting that. As a testimonial to that we got to see the the DELL EMC Storage Spaces Direct Ready nodes based on the soon to be available generation 14 PowerEdge servers.

R740-400x239

That is how we worked for many years with DELL and we have been assured we can continue to work with DELL EMC. That what Michael Dell committed to and I have seen them deliver on that promise for many years. For me that’s enough to be confident in that until proven different. Even if that message was sometimes brought in a way that made me think Las Vegas had gotten the better of some conference managers. But let’s not get the form in the way of the content.

On a final note, Dell EMC is not anti public cloud or pro on-premises. That’s how it should be and that how we deliver IT. We use the tools at our disposal to build the best possible solutions we can. What we use depends on the needs and changes as technology evolves. That’s OK. Saying you need hardware doesn’t make you a cloud hater or vice versa. The world is not that simple.

Hardware maintenance, the unsung hero of IT or “what hero culture?”

How does one keep an IT Infrastructure in top form? With care, knowledge, dedication and maintenance. For some this still comes as a surprise. To many the job is done when a product or software is acquired, sold or delivered. After all what else is there to be done?

Lots. True analysis, design and architecture requires a serious effort. Despite the glossy brochures the world isn’t a perfect and shiny as it should be. Experience and knowledge go a long way in making sure you build solid solutions that can be maintained with minimal impact on the services.

Maintenance must be one of the least appreciated areas that are valuable and necessary. The things we do that management, not even IT management, knows about are numerous. Let alone that they would understand what and why.

Take firmware upgrades for example. Switches, load balancers, servers …. The right choice of a solution and the right design means you’ll be able to do maintenance without downtime or service impact.

Who’s manager knows that even server PSUs need upgrades? Do they realize how much down time that takes for a server with redundant power supplies?

image

It takes up to 20-25 minutes per node. Yes! So you see that 10Gbps live migration network has yet another benefit, cuts down on the total time needed to complete this effort in a cluster. Combine it with Cluster Aware Updating and it’s fully automated. Just make sure people in ops know it takes this long or they might start trouble shooting something that’s normal. So you want to have clusters, you want independent redundant switches or MLAG/VLT, vPC, …

image

Yes, an older switch model, but the only stack still in use in a data center. At client sites I don’t mind that much, different workload.

Think about your storage fabrics, load balancers, gateways … all redundant & independent to allow no service affecting maintenance.

If you do not have a solution & practices in place that keep your business running during maintenance people might avoid it. As a result you might suffer down time that’s classified as buggy software or unavoidable hardware failure. But there is another side to that medal, the good old saying “if it ain’t broke, don’t fix it”. On top of that even hardware maintenance requires care and needs a plan to deal with failure; it to has bugs and can go wrong.

There is a lot of noise about the “hero” culture or IT Ops and a “cowboy mentality” with system administrators. Partially this is supposed to be cultivated by the fact they get rewarded for being a hero, or so I read. In my experience that’s not really the case, you work at night or through the night and have to show up at work anyway and explain what went wrong. No appreciation, money or anything.  Basically you as an admin pay the price. There is no over time pay, on call remuneration or anything. Maybe it’s different but I have not seen many “hero cultures” in real life in IT Ops (as said we pay personally for our mistakes or misfortunes). Realistically I have the perception that the“cowboy culture” is a lot more rampant at the white collar managers  level. You known when they decide to buy the latest and greatest solution du jour to fix something that isn’t caused by existing products or technology. When it blows up it’s an operational issue. Right? Well, don’t worry, the cloud will make it all go way! Cloud. Sure, cloud is big and getting bigger. It brings many benefits but also drawbacks, especially when done wrong. There are many factors to consider and it can’t be done just like that. It needs the same care and effort in analysis, design, architecture, deployment as all other infrastructure. You see it’s not just operational ease where the benefit of cloud lies but in the fact that when done right it allows for a whole different way of building & supporting services. That’s where the real value is.

So yes that’s why we do architecture and why we design with a purpose. So we can schedule regular maintenance.  So we can minimize or even avoid any impact. On premises we build solutions that allow this to be done during office hours and can survive even a failed firmware upgrade. In the cloud we try to protect against cloud provider failure. You might have notices they to have issues.

Cowboys? Hero culture? No me, site resilience engineering for the win!

DELL PowerEdge R730 Improves Boot Times

The DELL generation 13 servers are blazingly fast and capable servers. That’s has been well documented by now and more and more people are experiencing it themselves. These are my current preferred servers due to the best value in the market for hard core, no nonsense, high performance virtualization with Hyper-V.

They also have better boot/reboot speeds than the previous generations with UEFI.  We noticed this during deployment and testing. So we decided to informally check how much things have improved.

Using the DELL DRAC8 We test the speed form Windows Server restart …

image

… over the various boot phases …

image

… to the visual appearance of the logon screen

image

So now let’s quickly compare this for a DELL PowerEdge R720 and a PowerEdge R730. Bothe with the same amount of memory, cards, controllers etc. None of these servers had VMS running or another workload at the time of restart.

For the R720 this gave us:

image

and the results for a Windows initiated server restart on a DELL PowerEdge 730 with EUFI boot is:

image

This was reproducible. So we can see that we EUFI boot times have decrease with about 30%. I like that. You might think this is not important but it adds up during trouble shooting or when doing Cluster Aware Updates of a large 16+ node cluster.

Now thing are beginning to look even better as vNext of Windows has this feature call “Soft Restart” which should help us cut down on boot times even more when possible. But that’s for another blog post.