Hyper-V Technical Preview Live Migration & Changing Static Memory Size

I have played with  Hot Add & Remove Static Memory in a Hyper-V vNext Virtual Machine before and I love it. As I’m currently testing (actually sometimes abusing) the Technical Preview a bit to see what breaks I’m sometimes testing silly things. This is one of them.

I took a Technical Preview VM with 45GB of memory, running in a Technical Preview Hyper-V cluster and live migrate it.  I then tried to change the memory size up and down during live migration to see what happens, or at least nothing goes “BOINK”. Well, not much, we get a notification that we’re being silly. So no failed migrations, crashed or messed up VMs or, even worse hosts.

image

It’s early days yet but we’re getting a head start as there is a lot to test and that will only increase. The aim is to get a good understanding of the features, the capabilities and the behavior to make sure we can leverage our existing infrastructures and software assurance benefits as fast a possible. Rolling cluster upgrades should certainly help us do that faster, with more ease and less risk. What are your plans for vNext? Are you getting a feeling for it yet or waiting for a more recent test version?

Don’t tell me “It depends”! But it does!

Ah yes, the consultant’s answer. If you’ve been in working long enough you will have had it happen to you that opening an answer with “It depends” gets you an angry look, eyes being rolled in frustration and a short tempered “don’t give me that consultant crap” sneer. The injustice. First of all I’m not to blame for people hiring lousy “consultants” that hide their lack of skills behind that phrase. Sure you might have been conditioned into thinking it’s code for “I have no clue but I’ll bill you by the hour anyway” but that’s not my problem. Secondly I’m most probably not consulting for those people. There’s no point being a consigliere if they don’t want to listen, let alone if they won’t spend the time needed to explain their needs so I can listen. Most probably they can’t distinguish between valuable advice and expensive advice anyway. People who can’t see the difference are not observing very well as the distinction is crystal clear.

I’m always a bit disappointed when it happens. You’ll have to grow up an live with the fact that a lot of things are a bit more complex. If you don’t have the time to explain yourself or the intention to listen to someone you have asked to provide the best possible answer you have two options:

1. Do whatever you want to do (you have already decided)

2. Ask me to decide for you. In the absence of a clear cut answer I’ll just flip a coin for you and call it. As least that’s cheap and efficient.

When I tell you “it depends” I will gather as much information from you as I can.  I will also explain my answers and go into (great) detail as to the why, where, what and when of this. This can take some time and it can/will require effort due to complexities. It’s not code for “I don’t know” of “go away”. In my experience people who hate the “It depends” answer are the ones who don’t really want it. They quit at that opening sentence. Do engage and you might get the best possible advise from it in regards to your particular situation. With that in hand you can investigate further and work to the best possible solution for your needs in your environment.

Got it? Good.

Hardware maintenance, the unsung hero of IT or “what hero culture?”

How does one keep an IT Infrastructure in top form? With care, knowledge, dedication and maintenance. For some this still comes as a surprise. To many the job is done when a product or software is acquired, sold or delivered. After all what else is there to be done?

Lots. True analysis, design and architecture requires a serious effort. Despite the glossy brochures the world isn’t a perfect and shiny as it should be. Experience and knowledge go a long way in making sure you build solid solutions that can be maintained with minimal impact on the services.

Maintenance must be one of the least appreciated areas that are valuable and necessary. The things we do that management, not even IT management, knows about are numerous. Let alone that they would understand what and why.

Take firmware upgrades for example. Switches, load balancers, servers …. The right choice of a solution and the right design means you’ll be able to do maintenance without downtime or service impact.

Who’s manager knows that even server PSUs need upgrades? Do they realize how much down time that takes for a server with redundant power supplies?

image

It takes up to 20-25 minutes per node. Yes! So you see that 10Gbps live migration network has yet another benefit, cuts down on the total time needed to complete this effort in a cluster. Combine it with Cluster Aware Updating and it’s fully automated. Just make sure people in ops know it takes this long or they might start trouble shooting something that’s normal. So you want to have clusters, you want independent redundant switches or MLAG/VLT, vPC, …

image

Yes, an older switch model, but the only stack still in use in a data center. At client sites I don’t mind that much, different workload.

Think about your storage fabrics, load balancers, gateways … all redundant & independent to allow no service affecting maintenance.

If you do not have a solution & practices in place that keep your business running during maintenance people might avoid it. As a result you might suffer down time that’s classified as buggy software or unavoidable hardware failure. But there is another side to that medal, the good old saying “if it ain’t broke, don’t fix it”. On top of that even hardware maintenance requires care and needs a plan to deal with failure; it to has bugs and can go wrong.

There is a lot of noise about the “hero” culture or IT Ops and a “cowboy mentality” with system administrators. Partially this is supposed to be cultivated by the fact they get rewarded for being a hero, or so I read. In my experience that’s not really the case, you work at night or through the night and have to show up at work anyway and explain what went wrong. No appreciation, money or anything.  Basically you as an admin pay the price. There is no over time pay, on call remuneration or anything. Maybe it’s different but I have not seen many “hero cultures” in real life in IT Ops (as said we pay personally for our mistakes or misfortunes). Realistically I have the perception that the“cowboy culture” is a lot more rampant at the white collar managers  level. You known when they decide to buy the latest and greatest solution du jour to fix something that isn’t caused by existing products or technology. When it blows up it’s an operational issue. Right? Well, don’t worry, the cloud will make it all go way! Cloud. Sure, cloud is big and getting bigger. It brings many benefits but also drawbacks, especially when done wrong. There are many factors to consider and it can’t be done just like that. It needs the same care and effort in analysis, design, architecture, deployment as all other infrastructure. You see it’s not just operational ease where the benefit of cloud lies but in the fact that when done right it allows for a whole different way of building & supporting services. That’s where the real value is.

So yes that’s why we do architecture and why we design with a purpose. So we can schedule regular maintenance.  So we can minimize or even avoid any impact. On premises we build solutions that allow this to be done during office hours and can survive even a failed firmware upgrade. In the cloud we try to protect against cloud provider failure. You might have notices they to have issues.

Cowboys? Hero culture? No me, site resilience engineering for the win!