DELL EMC World 2017 Concludes

Today DELL EMC World 2017 ends with a dinner with DELL EMC management and engineers to discus our impressions on the information we took away from DELL EMC World 2017. I would like to thank the ever hard working Sarah Vela for making this possible. It’s much appreciated.

image

Professionally I’m blessed with multiple opportunities to attend conferences and summits. That’s where I get to talk to the skilled and passionate people who work on the technologies we work with intensively. This is very much a two way street where we learn from each other. And on many conferences I might also be a speaker or participate in advisory boards to provide feedback. Some of those latter discussions are under NDA. This is normal and I have NDA’s with other companies as well. That’s the legal side of the trust we place in each other in order to discuss evolving and future technologies.

I attend multiple events from different players. Some of these disagree with me and that is fine. We learn from being challenged. It helps us define more clearly what we design and build as well as why and how. More and more solutions become a more diverse, multi pronged combination of components with their specific capabilities at our disposal. These change fast and so do our solutions. An element not to be ignored in designing those solutions. That’s one take away from DELL EMC world that seems to have hit home. The other is that some companies are in a rather dire IT condition due to years of stand still.

I’m happy to see that today and tomorrow DELL EMC has the technologies needed for us to deliver modern IT solutions. The way in which we choose to do so is our choice and DELL EMC states it is committed to supporting that. As a testimonial to that we got to see the the DELL EMC Storage Spaces Direct Ready nodes based on the soon to be available generation 14 PowerEdge servers.

R740-400x239

That is how we worked for many years with DELL and we have been assured we can continue to work with DELL EMC. That what Michael Dell committed to and I have seen them deliver on that promise for many years. For me that’s enough to be confident in that until proven different. Even if that message was sometimes brought in a way that made me think Las Vegas had gotten the better of some conference managers. But let’s not get the form in the way of the content.

On a final note, Dell EMC is not anti public cloud or pro on-premises. That’s how it should be and that how we deliver IT. We use the tools at our disposal to build the best possible solutions we can. What we use depends on the needs and changes as technology evolves. That’s OK. Saying you need hardware doesn’t make you a cloud hater or vice versa. The world is not that simple.

Hardware maintenance, the unsung hero of IT or “what hero culture?”

How does one keep an IT Infrastructure in top form? With care, knowledge, dedication and maintenance. For some this still comes as a surprise. To many the job is done when a product or software is acquired, sold or delivered. After all what else is there to be done?

Lots. True analysis, design and architecture requires a serious effort. Despite the glossy brochures the world isn’t a perfect and shiny as it should be. Experience and knowledge go a long way in making sure you build solid solutions that can be maintained with minimal impact on the services.

Maintenance must be one of the least appreciated areas that are valuable and necessary. The things we do that management, not even IT management, knows about are numerous. Let alone that they would understand what and why.

Take firmware upgrades for example. Switches, load balancers, servers …. The right choice of a solution and the right design means you’ll be able to do maintenance without downtime or service impact.

Who’s manager knows that even server PSUs need upgrades? Do they realize how much down time that takes for a server with redundant power supplies?

image

It takes up to 20-25 minutes per node. Yes! So you see that 10Gbps live migration network has yet another benefit, cuts down on the total time needed to complete this effort in a cluster. Combine it with Cluster Aware Updating and it’s fully automated. Just make sure people in ops know it takes this long or they might start trouble shooting something that’s normal. So you want to have clusters, you want independent redundant switches or MLAG/VLT, vPC, …

image

Yes, an older switch model, but the only stack still in use in a data center. At client sites I don’t mind that much, different workload.

Think about your storage fabrics, load balancers, gateways … all redundant & independent to allow no service affecting maintenance.

If you do not have a solution & practices in place that keep your business running during maintenance people might avoid it. As a result you might suffer down time that’s classified as buggy software or unavoidable hardware failure. But there is another side to that medal, the good old saying “if it ain’t broke, don’t fix it”. On top of that even hardware maintenance requires care and needs a plan to deal with failure; it to has bugs and can go wrong.

There is a lot of noise about the “hero” culture or IT Ops and a “cowboy mentality” with system administrators. Partially this is supposed to be cultivated by the fact they get rewarded for being a hero, or so I read. In my experience that’s not really the case, you work at night or through the night and have to show up at work anyway and explain what went wrong. No appreciation, money or anything.  Basically you as an admin pay the price. There is no over time pay, on call remuneration or anything. Maybe it’s different but I have not seen many “hero cultures” in real life in IT Ops (as said we pay personally for our mistakes or misfortunes). Realistically I have the perception that the“cowboy culture” is a lot more rampant at the white collar managers  level. You known when they decide to buy the latest and greatest solution du jour to fix something that isn’t caused by existing products or technology. When it blows up it’s an operational issue. Right? Well, don’t worry, the cloud will make it all go way! Cloud. Sure, cloud is big and getting bigger. It brings many benefits but also drawbacks, especially when done wrong. There are many factors to consider and it can’t be done just like that. It needs the same care and effort in analysis, design, architecture, deployment as all other infrastructure. You see it’s not just operational ease where the benefit of cloud lies but in the fact that when done right it allows for a whole different way of building & supporting services. That’s where the real value is.

So yes that’s why we do architecture and why we design with a purpose. So we can schedule regular maintenance.  So we can minimize or even avoid any impact. On premises we build solutions that allow this to be done during office hours and can survive even a failed firmware upgrade. In the cloud we try to protect against cloud provider failure. You might have notices they to have issues.

Cowboys? Hero culture? No me, site resilience engineering for the win!

DELL PowerEdge R730 Improves Boot Times

The DELL generation 13 servers are blazingly fast and capable servers. That’s has been well documented by now and more and more people are experiencing it themselves. These are my current preferred servers due to the best value in the market for hard core, no nonsense, high performance virtualization with Hyper-V.

They also have better boot/reboot speeds than the previous generations with UEFI.  We noticed this during deployment and testing. So we decided to informally check how much things have improved.

Using the DELL DRAC8 We test the speed form Windows Server restart …

image

… over the various boot phases …

image

… to the visual appearance of the logon screen

image

So now let’s quickly compare this for a DELL PowerEdge R720 and a PowerEdge R730. Bothe with the same amount of memory, cards, controllers etc. None of these servers had VMS running or another workload at the time of restart.

For the R720 this gave us:

image

and the results for a Windows initiated server restart on a DELL PowerEdge 730 with EUFI boot is:

image

This was reproducible. So we can see that we EUFI boot times have decrease with about 30%. I like that. You might think this is not important but it adds up during trouble shooting or when doing Cluster Aware Updates of a large 16+ node cluster.

Now thing are beginning to look even better as vNext of Windows has this feature call “Soft Restart” which should help us cut down on boot times even more when possible. But that’s for another blog post.