Key Take Aways From MMS2013

Introduction

I’ve parked myself at McCarran International Airport in Las Vegas awaiting the start of my long haul home to Europe. The new terminal is inspiring me to share as I reflect on the past week and on what’s happening at work.

ICT in the 21st Century

A lot is going on and moving in ICT and even more is coming our way. In the Microsoft sphere we got the official heads up at MMS2013 that new features would be publicly discussed at TechEd 2013 (USA or Europe). So you might want to attend that one. I for one think that’s great. We need that information to verify we’re still are on the right track and fine tune our course. Especially in those areas where we can get quick wins with sometimes significant cost savings & benefits.  I could start telling you about all the great sessions and products at MMS2013 whilst quacking like a duck “cloud, cloud, …, cloud, cloud, cloud, … cloud”. But I will not. You can watch it all here.  I will reflect on the key take away.

Cheaper & Faster

Cheaper AND faster are the new mantra or’ “fast is the new cheap”. Cheaper makes everyone happy especially when quality remains high. Faster is sometimes a bit more of a challenge to sell. “New features, already?”  you say. Yes. The nature of our economies and industry is being transformed by the cloud and commoditization. It brings a lot of benefits, especially in a high speed, low drag world.

Fast is actually faster. For many years now any strategy & execution plan that took more than a couple of years was doomed. You get bypassed and your big investments will never live up to their potential. So, apart from the necessary larger and more long term investments, we evolve more and more towards a perpetual improvements & rapid adoption model. Innovation and the subsequent commoditization of it is pushing this. That’s not bad. By making constant smaller (easier to fund) investments that deliver fast results we get to a more adaptable, agile environment for lesser costs. It’s not that all long term, large scale projects are going away but the ratio is shifting. In smart countries this is already being done for building hospitals and other infrastructure that evolves fast. It’s not unique to ICT. Massive projects taking too long and too much funding lead to out of date solutions at the time of delivery at huge costs. Use this approach where needed but forget about it for the other projects. Cloud will be an important tool in all this, not the goal.

A Word of Warning

Fast and cheap shouldn’t translate into mediocre crap at dump pricing that will bite us. It should also keep in mind the ecosystem and don’t act like a shock & awe offensive leaving everything in it’s track in disarray. It needs to fit into a plan with clear goals an knowing where it fits in and helps.  It’s about balance. That’s the art. Knowing what, where, when and with/for who to do it. Not easy. Now let’s hope some of my managers read this blog. It might help them. As the question beckons an answer: who is it that will lead us in this new era? Well not one single person, far from it. It’s a team effort and to lead a team takes competence and some character.

It takes competence and personality

Competence and personality, combined with  applying both these (skills and  drive) diligently in a sustained fashion. That requires a lot of effort, even when no one is watching you, or perhaps better stated, especially then. Do what needs to be done where and when needed. Not because it could get you promoted or more money. That’s the character part. That’s what drives us to learn by participating in our ICT communities, presenting, attending conferences and networking. But also in those hours spend reading, studying and working in the lab alone or with a buddy. That’s what will make us able to handle the though and bad situations you’ll encounter and overcome them. It’s your resourcefulness that will make you seek and find opportunity in adverse conditions. People like the team members amongst whom I have the distinct pleasure of working. You can’t find such synergy if it’s only about personal gain and getting ahead. There is both a broad and deep skill set needed by all involved and doesn’t come easy nor can it be bought. It has to be acquired through work and experience. The transformation of the ICT landscape is uncharted domain for all but a few of us so it’s going to ask a lot of effort, often outside of our comfort zone.

Sure there are cynics who laugh at this and can’t imagine why someone would do all that without personal and immediate reward. Those are the ones we don’t need and who won’t be there at crunch time. Only after the facts they seek the spotlight to poach the glory if things went well or to condemn those that failed whilst trying. Well, the last so called leader who did that doesn’t work with us anymore. Enough said.

A reality Check On Disaster Recovery & Business Continuity

Introduction

Another blog post in “The Dilbert Life Series®” for those who are not taking everything personal. Every time business types start talking about business continuity, for some reason, call it experience or cynicism, my bull shit & assumption sensors go into high alert mode. They tend to spend a certain (sometimes considerable) amount of money on connectivity, storage, CPUs at a remote site, 2000 pages of documentation and think that covers just about anything they’ll need. They’ll then ask you when the automatic or 5 minute failover to the secondary site will be up and running. That’s when the time has come to subdue all those inflated expectations and reduce the expectation gap between business and IT as much as possible. It should never have come to that in the first place. But in this matter business people & analysts alike, often read (or are fed) some marchitecture docs with a bunch of sales brochures which make it al sound very easy and quickly accomplished. They sometimes think that the good old IT department is saying “no” again just because they are negative people who aren’t team players and lack the necessary “can do attitude” in world where their technology castle is falling down. Well, sorry to bust the bubble, but that’s not it. The world isn’t quite that black and white. You see the techies have to make it work and they’re the ones who have to deal with the real. Combine the above with a weak and rather incompetent IT manager bending over to the business (i.e. promising them heaven on earth) to stay in there good grace and it becomes a certainty they’re going to get a rude awakening. Not that the realities are all that bad. Far from it, but the expectations can be so high and unrealistic that disappointment is unavoidable.

The typical flow of things

The business is under pressure from peers, top management, government & regulators to pay attention to disaster recovery. This, inevitably leads to an interest in business continuity. Why, well we’re in a 24/7 economy and your consumer right to buy a new coffee table on line at 03:00 AM on a Sunday night is worth some effort.  So if we can do it for furniture we should certainly have it for more critical services. The business will hear about possible (technology) solutions and would like to see them implemented. Why wouldn’t they? It all sounds effective and logical. So why aren’t we all running of and doing it? Is it because IT is a bunch of lazy geeks playing FPS games online rather than working for their mythically high salaries? How hard can it be? It’s all over the press that IT is a commodity, easy, fast, dynamic and consumer driven so “we” the consumers want our business continuity now! But hey it costs money, time, a considerable and sustained effort and we have to deal with the less than optimal legacy applications (90% of what you’re running right now).

Realities & 24/7 standby personnel

The acronyms & buzz words the business comes up with after attending some tech briefing by Vendors Y & Z (those are a bit like infomercials but without the limited value those might have Sarcastic smile) can be quite entertaining. You could say these people at least pay attention to the consumerized business types. Well actually they don’t, but they do smell money and lots of it. Technically they are not lying. In a perfect world things might work like that … sort of, some times and maybe even when you need it. But it will really work well and reliable. Sure that’s not the vendors fault. He can’t help  that the cool “jump of a cliff” boots he sold you got you killed. Yes they are designed to jump of a cliff but anything above 1 meter without other precautions and technologies might cause bodily harm or even death. But gravity and its effects in combination with the complexity of your businesses are beyond the scope of their product solutions and are entirely your responsibility. Will you be able to cover all those aspects?

Also don’t forget the people factor. Do you have the right people & skill sets at your disposal 24/7 for that time when disaster strikes? Remember that could be on a hot summer night in a weekend when they are enjoying a few glasses of wine at a BBQ party and not at 10:15 AM on a Tuesday morning.

So what terminology flies around?

They hear about asynchronous or even synchronous replication of storage of applications. Sure it can work within a data center, depending on how well it is designed and setup. It can even work between data centers, especially for applications like Exchange 2010. But let’s face it, the technical limitations and the lack of support for this in many of the legacy applications will hinder this considerably.

They hear of things like stretched clusters and synchronous storage replication. Sure they’ll sell you all kinds of licensed features to make this works at the storage level with a lot of small print. Sometimes even at the cost of losing functionality that makes the storage interesting in the first place. At the network level anything below layer 3 probably suffers from too much optimism. Sure stretched subnets seem nice but … how reliable are these solutions in real live?

Consider the latency and less reliable connectivity.You can and will lose the link once in a while. With active-active or active-passive data centers that depend on each other both become single points of failure. And then there are all the scenarios where only one part of the entire technology stack that makes everything work fails. What if the application clustering survives but not the network, the storage or the database? You’re toast any way. Even worse, what if you get into a split brain scenario and have two sides writing data. Recover from that one my friend, there’s no merge process for that, only data recovery. What about live migration or live motion (state, storage, shared nothing) across data centers to avoid an impending disaster? That’s a pipe dream at the moment people. How long can you afford for this to take even if your link is 99.999% reliable? Chances are that in a crisis things need to happen vast to avoid disaster and guess what even in the same data center, during normal routine operations, we’re leveraging <1ms latency 10Gbps pipes for this. Are we going to get solutions that are affordable and robust? Yes, and I think the hypervisor vendors will help push the entire industry forward when I see what is happening in that space but we’re not in Walhalla yet.

Our client server application has high availability capabilities

There are those “robust and highly available application architectures” (ahum) that only hold true if nothing ever goes wrong or happens to the rest of the universe. “Disasters” such as the server hosting the license dongle that is rebooted for patching. Or, heaven forbid, your TCP/IP connection dropped some packages due to high volume traffic. No we can’t do QoS on the individual application level and even if we could it wouldn’t help. If your line of business software can’t handle a WAN link without serious performance impact or errors due to a dropped packet, it was probably written and tested on  <1ms latency networks against a database with only one active connection. It wasn’t designed, it was merely written. It’s not because software runs on an OS that can be made highly available and uses a database that can be clustered that this application has any high availability, let alone business continuity capabilities. Why would that application be happy switching over to another link. A link that is possibly further away and running on less resources and quite possibly against less capable storage? For your apps to works acceptably in such scenarios you would already have to redesign them.

You must also realize that a lot of acquired and home written software has IP addresses in configuration files instead of DNS names. Some even have IP addresses in code.  Some abuse local host files to deal with hard coded DNS names … There are tons of very bad practices out there running in production. And you want business continuity for that? Not just disaster recovery  to be clear but business continuity, preferably without dropping one beat. Done any real software and infrastructure engineering in your life time have you? Keeping a business running often looks like a a MacGyver series. Lots creativity, ingenuity, super glue, wire, duct tape and Swiss army knife or multi tool. This is still true today, it doesn’t sound cool to admit to it, but it needs to be said.

We can make this work with the right methodologies and strict processes

Next time you think that, go to the top floor and jump of, adhering to the flight methodologies and strict processes that rule aerodynamics. After the loud thud due to you hitting the deck, you’ll be nothing more than a pool of human waste. You cannot fly. On top of unrealistic scenarios things change so fast that documentation and procedures are very often out of date as soon as they are written.

Next time some “consultants” drop in selling you products & processes with fancy acronyms proclaiming rigorous adherence to these will safe the day consider the following. They make a bold assumption given the fact they don’t know even 10% of the apps and processes in your company. Even bolder because they ignore the fact that what they discover in interviews often barely scratches the surface. People can only tell you what they actually know or dare tell you. On top of that any discovery they do with tools is rather incomplete. If the job consist of merely pushing processes and methodologies around without reality checks you could be in for a big surprise. You need the holistic approach here, otherwise it’s make believe. It’s a bit like paratrooper training for night drops over enemy strong holds, to attack those and bring ‘m down. Only the training is done in a heated class room during meetings and on a computer. They do not ever put on all their gear, let alone jump out of an aircraft in the dead of night, regroup, hump all that gear to the rally points and engage the enemy in a training exercise. Well people, you’ll never be able to pull of business continuity in real life either if you don’t design and test properly and keep doing that. It’s fantasy land. Even in the best of circumstances no plan survives it first contact with the enemy and basically you would be doing the equivalent of a trooper firing his rifle for the very first time at night during a real engagement. That’s assuming you didn’t break your neck during the drop, got lost and managed to load the darn thing in the first place.

You’re a pain in the proverbial ass to work with

Am I being to negative? No, I’m being realistic. I know reality is a very unwelcome guest in fantasy land as it tends to disturb the feel good factor. Those pesky details are not just silly technological “manual labor” issues people. They’ll kill your shiny plans, waste tremendous amounts of money and time.

We can have mission critical applications protected and provide both disaster recovery and business continuity. For that the entire solution stack need to be designed for this. While possible, this makes things expensive and often only a dream for custom written and a lot of the shelf software. If you need business continuity, the applications need to be designed and written for it. If not, all the money and creativity in the world cannot guarantee you anything. In fact they are even at best ugly and very expensive hacks to cheap and not highly available software that poses as “mission critical”.

Conclusion

Seriously people, business continuity can be a very costly and complex subject. You’ll need to think this through. When making assumptions realize that you cannot go forward without confirming them. We operate by the mantra “assumptions are the mother of al fuckups” which is nothing more than the age old “Trust but verify” in action. There are many things you can do for disaster recovery and business continuity. Do them with insight, know what you are getting into and maybe forget about doing it without one second of interruption for your entire business.

Let’s say disaster strikes and the primary data center is destroyed. If you can restart and get running again with only a limited amount of work and productivity lost, you’re doing very well. Being down for only a couple of hours or days or even a week, will make you one of the top performers. Really! Try to get there first before thinking about continuous availability via disaster avoidance and automatic autonomous failovers.

One approach to achieve this is what I call “Pandora’s Box”. If a company wants to have business continuity for its entire stack of operations you’ll have to leave that box closed and replicate it entirely to another site. When you’re hit with a major long lasting disaster you eat the down time and loss of a certain delta, fire up the entire box in another location. That way you can avoid trying to micro manage it’s content. You’ll fail at that anyway. For short term disasters you have to eat the downtime. Deciding when to fail over is a hard decision. Also don’t forget about the process in reverse order. That’s another part of the ball game.

It’s sad to see that more money is spend consulting & advisers daydreaming than on realistic planning and mitigation. If you want to know why this is allowed to happen there’s always my series on The do’s and don’ts when engaging consultants Part I and Part II. FYI, the last guru I saw brought into a shop was “convinced” he could open Pandora’s Box and remain in control. He has left the building by now and it wasn’t a pretty sight, but that’s another story.

Yahoo’s “Physically Together” is Management Failure

I’m awaiting boarding at SEATAC and browsing the news. I suggest you read “Physically Together”: Here’s the Internal Yahoo No-Work-From-Home Memo for Remote Workers and Maybe More and consider the quote below.

“… Speed and quality are often sacrificed when we work from home  …”

If I was working for Yahoo I’d be jumping the ship. That mentality just doesn’t compute. If anything I have seen the working conditions become worse and worse in offices over the past decade. All the new open/flex work office plans with the continuous interrupts, office chit chat & gossip, noise and countless never ending meetings (I guess partially to escape the lousy desk experience) are probably very good for the bottom line but all the rest of it seems to be working out a lot less well.

Granted, part of that is because of bad execution. It works if you can and will adopt that culture. But more often than not they just transplant the old ways into the new office environment with disastrous results. But the savings are there; so they don’t really mind. Just like they don’t mind outsourcing or consultants. Those don’t come into the office either but they do help reduce head count and CAPEX, whatever helps the Excel sheet look better. Speed and quality can often suffer as well in these cases but then the response is to have better governance and processes, not to drag them all into the landscape office meadow.

And as far as speed and quality … I’ll be crystal clear, I’m not buying that for one second. If I had not been responding to alerts (we have no on call) on weekends the company would no longer exist. It would have lost it’s entire infrastructure a couple of times with little or no hope of recovery. If they force me to be at the office between 08:30 and 17:30 every day they would not get that commitment and I would work a lot less hours. The same goes for my team. We expect a lot and we give a lot. Checks and balances. How are you supposed to build a top notch team on mediocre management practices is beyond me. We put in the effort because that’s what we give back to our employers in return for a lot of flexibility and freedom on how we organize ourselves and the team.

Some middle management that wants hot bodies in the seats to respond to every question they have is very worrying to me and those people have no sense of real priorities. Perhaps of self importance, yes, but not priorities Look organize yourself any way you need to to deliver what ever it is but the above quote executed across the board is sad in it’s simplification and denial of realities.

But go ahead. Sacrifice your agility and flexibility to be able to keep operations going during snow storms, flu pandemics and go on wasting time and resources commuting during peak traffic hours. The trick to making all of this work is to make it part of the normal way of working. The ratio of type of flex and telework might change during such times but that’s it. Any organization who cannot see this, act on it and leverage the new possibilities technology offers us is a victim of management failure. These across the board decisions are a clear sign of that and make me list Yahoo on the “Unsuitable Employers” list. Their speed and quality may very well suffer from this decision.

Are you perhaps saying your employees are goofing of at home and are under performing? Well if physical presence is the only way to make sure they are doing a good job you’re really in trouble. You have many other and more serious problems I think and good luck to you if you think pulling then back into the office will fix this. Probably this is really the issue. They’ve lost insight in who does what and why. End states are not defined, lack of accountability, … or otherwise put: management failure.

Or are you a serious professional who can’t stand the idea of your senior engineer sitting in his pajamas writing code or building a cluster at 10:00 or 22:00 hours? You think he needs to be in khakis and shirt? If it’s the pajama image you could consider hiring super models as engineers, the idea will become a lot more pleasant,  I guarantee it Winking smile. Or are you worried about the odd working hours and the impact on the well being of your employees? Changes are they’ll do that anyway or even more when having to be in the office. They can’t get the real work done when having to sit in that sub optimal cube all day and dealing with all the senseless interrupts.

What if people don’t flee you because of this policy but just zone out. They show up for whatever mandatory time they need to. When shuffled like cattle into their cubicles and or pastures (open landscape offices) they’ll put on their noise cancellation headsets, run of to meetings (anything to escape the chaos and interrupt hell the modern office environment has become. Their talent, engagement, motivation and zeal will go to what they love to do and those organizations will end up as mediocre players putting in the bear minimum. Well played. Look, today we’re expected to be able to work from anywhere at any time and indeed technology has enabled this for a significant amount of people. A lot of us do that and we’re very flexible about it as we commit to our jobs and working lives in ever more flexible ways. Now on top of that they expect us to show up on the clock and proof attendance in a rather than creating a win-win situation?

On top of that they do this in a time where managers claim that talent will flee companies that do not allow BYOD or other consumer IT.  Really, but having old school office organizations wont? Flexibility works both ways. Employees can be very efficient and committed. But any manager looking to extract every last ounce of profit or plays power games because they can’t deal with end state management will loose more then they will ever gain. A BYOD device policy cannot attract and retain the best of the best. Trust me, those fine employees will figure out very fast that they’ll choose flex time, telecommuting, better pay and extra paid holidays over that stupid iPad or iPhone. Consumerization of ICT means they don’t need your technology and devices. They’ll buy their own and use it for their own advancement and interest and you’ll be left in your holding the short end of the stick. You shouldn’t care that your  employees make you money while stepping on a cross trainer at home or even from their bath tub.

I really don’t buy into the fact that this is all complicating the creation of products or the delivery of services. It also doesn’t ruin any long term supportability. People will go where they think they are best off.  So what is this move? A need to reduce head count and trying to achieve this by people calling it quit voluntarily? So basically you’re even unable to fix performance issues with your feedback/planning and evaluation system? Oh boy. So what if your best quit and the worst show up at the office? Yahoo’s in a pretty bad state it seems.

Is it a power play and about limiting options for people to see how obedient they are? If all the “our employees are our biggest and most important resource” is true some things would be really different. For one your employees would tell you to stop considering and treating them a resource to move around at will. After all this is not an national crisis and this is not the military at war. In a real war for talent employees would interview you whether to see if you’re even worth working for. Most companies don’t like the power to shift to the employees to far. They have seen this for short periods of time in certain professions and they still haven’t recovered from that shock to their system. They’d rather have less of it, not more. It’s all way to complicated for them to handle and manage. It also costs them more.

Are Data Tsunamis Inevitable Or Man Made Disasters?

What happens when people who have no real knowledge and context about how to handle data, infrastructure or applications insist on being in charge and need to be seen as taking strong decisive actions without ever being held responsible? It leads to real bad, often silly decisions with a bunch of unintended consequences. Storage vendors love this. More iron to sell. And yes, all this is predictable. When I’m able and allowed to poke around in storage and the data stored I often come to the following conclusion: there’s a bulk amount of data that is stored in an economical unsound fashion. Storage vendors & software vendors love this, as there are now data life cycle management tools & appliances to be sold.

The backlash of all this is? Cost cutting, which then leads to the data that has valid needs to be stored and protected not getting the resources it should. Why? Well who’s going to take responsibility to push the delete button to remove the other data? As we get ever better technology to store, transport and protect data we manage to do more with less money and personnel. But as is often the case, no good deed goes unpunished. Way to often these savings or efficiencies flow straight into the bottomless pit caused by that age old “horror vacui” principle in action in the world of data storage.

You get situations like this: “Can I have 60TB of storage?  It’s okay, I discussed this with your colleague last year, he said you’d have 60TB available at this time frame”

What is the use case? How do you need it? What applications or services will consume this storage? Do you really need this to be on a SAN or can we dump this in cost effective Windows Server Storage Spaces with ReFS? What are the economics involved around this data? Is it worth doing? What projects is this assigned to? Who’s the PM? Where is the functional analysis. Will this work? Has there been a POC? Was that POC sound? Was there a pilot? What the RTO? The RPO? Does it need to be replicated off site? What IOPS is required? How will it be accessed? What security is needed? Any encryption required? Any laws affecting the above? All you get is a lot of vacant blank stares and lot’s of “just get it done”. How can it be that with so many analysts and managers of all sorts running around to meeting after meeting, all in order to get companies running like a well oiled slick mean machine, we end up with this question at the desk of an operational systems administrator as a result? Basically what are you asking for? Why are you asking this and did you think this through?

waterjugs

Consider the following. What if you asked for 30 billion gallons of water at our desk and we say “sure” and just sent it to you. We did what you asked. Perhaps you meant bottled drinking water but below is what you’ll end up with. And yes it completely up to specifications, limited as they are.

vlcsnap-2013-01-12-10h49m42s238

The last words heard while drowning will be “Who ordered this? You can bet no one will be responsible, especially not when the bill arrives and when the resulting mess needs to be cleaned up. Data in the cloud will not solve this. Like the hosting business, who serve up massive amount of idle servers, the cloud will host massive amounts of idle data as in both situations it’s providing the service that generates revenue, not the real use of that service by you or it’s economic value to you.