Azure SQL Database now supports powerful geo-replication features for all service tiers

Business continuity has just gotten a new and improved way of providing consistent access to Azure SQL Server databases. Azure SQL Database now supports powerful geo-replication features for all service tiers. Read that again: for ALL service tiers!

Now, anyone who’s ever worked on providing business continuity knows this is hard to do and get right than it sounds. Whether you use SQL Server databases or other mature and quality ways of persisting data you know from experience that maintaining a consistent copy of your data read/write is a challenge.

The way to deal with that challenge is often to have one read/write master copy and have one or more read only copies. The idea is that during a disaster you’ll be able to serve the majority of your needs with a a read only copy and have at least a basic, good enough service until the issue is fixed or a new read/write copy comes on line. It all sounds great but achieving this and maintaining access to the data during a full blow disaster isn’t easy. You need multiple geographical locations and provide access to them. The latter preferably transparent to the clients. These are case where public cloud computing shines and now more then ever as Azure SQL Database now supports powerful geo-replication features for all service tiers.

image

There’s no more need to upgrade to premium and this capability replaces standard geo replication. Business continuity is rapidly becoming a de facto way of designing and building apps as the cost & complexity blocking this are being torn down.

A reality Check On Disaster Recovery & Business Continuity

Introduction

Another blog post in “The Dilbert Life Series®” for those who are not taking everything personal. Every time business types start talking about business continuity, for some reason, call it experience or cynicism, my bull shit & assumption sensors go into high alert mode. They tend to spend a certain (sometimes considerable) amount of money on connectivity, storage, CPUs at a remote site, 2000 pages of documentation and think that covers just about anything they’ll need. They’ll then ask you when the automatic or 5 minute failover to the secondary site will be up and running. That’s when the time has come to subdue all those inflated expectations and reduce the expectation gap between business and IT as much as possible. It should never have come to that in the first place. But in this matter business people & analysts alike, often read (or are fed) some marchitecture docs with a bunch of sales brochures which make it al sound very easy and quickly accomplished. They sometimes think that the good old IT department is saying “no” again just because they are negative people who aren’t team players and lack the necessary “can do attitude” in world where their technology castle is falling down. Well, sorry to bust the bubble, but that’s not it. The world isn’t quite that black and white. You see the techies have to make it work and they’re the ones who have to deal with the real. Combine the above with a weak and rather incompetent IT manager bending over to the business (i.e. promising them heaven on earth) to stay in there good grace and it becomes a certainty they’re going to get a rude awakening. Not that the realities are all that bad. Far from it, but the expectations can be so high and unrealistic that disappointment is unavoidable.

The typical flow of things

The business is under pressure from peers, top management, government & regulators to pay attention to disaster recovery. This, inevitably leads to an interest in business continuity. Why, well we’re in a 24/7 economy and your consumer right to buy a new coffee table on line at 03:00 AM on a Sunday night is worth some effort.  So if we can do it for furniture we should certainly have it for more critical services. The business will hear about possible (technology) solutions and would like to see them implemented. Why wouldn’t they? It all sounds effective and logical. So why aren’t we all running of and doing it? Is it because IT is a bunch of lazy geeks playing FPS games online rather than working for their mythically high salaries? How hard can it be? It’s all over the press that IT is a commodity, easy, fast, dynamic and consumer driven so “we” the consumers want our business continuity now! But hey it costs money, time, a considerable and sustained effort and we have to deal with the less than optimal legacy applications (90% of what you’re running right now).

Realities & 24/7 standby personnel

The acronyms & buzz words the business comes up with after attending some tech briefing by Vendors Y & Z (those are a bit like infomercials but without the limited value those might have Sarcastic smile) can be quite entertaining. You could say these people at least pay attention to the consumerized business types. Well actually they don’t, but they do smell money and lots of it. Technically they are not lying. In a perfect world things might work like that … sort of, some times and maybe even when you need it. But it will really work well and reliable. Sure that’s not the vendors fault. He can’t help  that the cool “jump of a cliff” boots he sold you got you killed. Yes they are designed to jump of a cliff but anything above 1 meter without other precautions and technologies might cause bodily harm or even death. But gravity and its effects in combination with the complexity of your businesses are beyond the scope of their product solutions and are entirely your responsibility. Will you be able to cover all those aspects?

Also don’t forget the people factor. Do you have the right people & skill sets at your disposal 24/7 for that time when disaster strikes? Remember that could be on a hot summer night in a weekend when they are enjoying a few glasses of wine at a BBQ party and not at 10:15 AM on a Tuesday morning.

So what terminology flies around?

They hear about asynchronous or even synchronous replication of storage of applications. Sure it can work within a data center, depending on how well it is designed and setup. It can even work between data centers, especially for applications like Exchange 2010. But let’s face it, the technical limitations and the lack of support for this in many of the legacy applications will hinder this considerably.

They hear of things like stretched clusters and synchronous storage replication. Sure they’ll sell you all kinds of licensed features to make this works at the storage level with a lot of small print. Sometimes even at the cost of losing functionality that makes the storage interesting in the first place. At the network level anything below layer 3 probably suffers from too much optimism. Sure stretched subnets seem nice but … how reliable are these solutions in real live?

Consider the latency and less reliable connectivity.You can and will lose the link once in a while. With active-active or active-passive data centers that depend on each other both become single points of failure. And then there are all the scenarios where only one part of the entire technology stack that makes everything work fails. What if the application clustering survives but not the network, the storage or the database? You’re toast any way. Even worse, what if you get into a split brain scenario and have two sides writing data. Recover from that one my friend, there’s no merge process for that, only data recovery. What about live migration or live motion (state, storage, shared nothing) across data centers to avoid an impending disaster? That’s a pipe dream at the moment people. How long can you afford for this to take even if your link is 99.999% reliable? Chances are that in a crisis things need to happen vast to avoid disaster and guess what even in the same data center, during normal routine operations, we’re leveraging <1ms latency 10Gbps pipes for this. Are we going to get solutions that are affordable and robust? Yes, and I think the hypervisor vendors will help push the entire industry forward when I see what is happening in that space but we’re not in Walhalla yet.

Our client server application has high availability capabilities

There are those “robust and highly available application architectures” (ahum) that only hold true if nothing ever goes wrong or happens to the rest of the universe. “Disasters” such as the server hosting the license dongle that is rebooted for patching. Or, heaven forbid, your TCP/IP connection dropped some packages due to high volume traffic. No we can’t do QoS on the individual application level and even if we could it wouldn’t help. If your line of business software can’t handle a WAN link without serious performance impact or errors due to a dropped packet, it was probably written and tested on  <1ms latency networks against a database with only one active connection. It wasn’t designed, it was merely written. It’s not because software runs on an OS that can be made highly available and uses a database that can be clustered that this application has any high availability, let alone business continuity capabilities. Why would that application be happy switching over to another link. A link that is possibly further away and running on less resources and quite possibly against less capable storage? For your apps to works acceptably in such scenarios you would already have to redesign them.

You must also realize that a lot of acquired and home written software has IP addresses in configuration files instead of DNS names. Some even have IP addresses in code.  Some abuse local host files to deal with hard coded DNS names … There are tons of very bad practices out there running in production. And you want business continuity for that? Not just disaster recovery  to be clear but business continuity, preferably without dropping one beat. Done any real software and infrastructure engineering in your life time have you? Keeping a business running often looks like a a MacGyver series. Lots creativity, ingenuity, super glue, wire, duct tape and Swiss army knife or multi tool. This is still true today, it doesn’t sound cool to admit to it, but it needs to be said.

We can make this work with the right methodologies and strict processes

Next time you think that, go to the top floor and jump of, adhering to the flight methodologies and strict processes that rule aerodynamics. After the loud thud due to you hitting the deck, you’ll be nothing more than a pool of human waste. You cannot fly. On top of unrealistic scenarios things change so fast that documentation and procedures are very often out of date as soon as they are written.

Next time some “consultants” drop in selling you products & processes with fancy acronyms proclaiming rigorous adherence to these will safe the day consider the following. They make a bold assumption given the fact they don’t know even 10% of the apps and processes in your company. Even bolder because they ignore the fact that what they discover in interviews often barely scratches the surface. People can only tell you what they actually know or dare tell you. On top of that any discovery they do with tools is rather incomplete. If the job consist of merely pushing processes and methodologies around without reality checks you could be in for a big surprise. You need the holistic approach here, otherwise it’s make believe. It’s a bit like paratrooper training for night drops over enemy strong holds, to attack those and bring ‘m down. Only the training is done in a heated class room during meetings and on a computer. They do not ever put on all their gear, let alone jump out of an aircraft in the dead of night, regroup, hump all that gear to the rally points and engage the enemy in a training exercise. Well people, you’ll never be able to pull of business continuity in real life either if you don’t design and test properly and keep doing that. It’s fantasy land. Even in the best of circumstances no plan survives it first contact with the enemy and basically you would be doing the equivalent of a trooper firing his rifle for the very first time at night during a real engagement. That’s assuming you didn’t break your neck during the drop, got lost and managed to load the darn thing in the first place.

You’re a pain in the proverbial ass to work with

Am I being to negative? No, I’m being realistic. I know reality is a very unwelcome guest in fantasy land as it tends to disturb the feel good factor. Those pesky details are not just silly technological “manual labor” issues people. They’ll kill your shiny plans, waste tremendous amounts of money and time.

We can have mission critical applications protected and provide both disaster recovery and business continuity. For that the entire solution stack need to be designed for this. While possible, this makes things expensive and often only a dream for custom written and a lot of the shelf software. If you need business continuity, the applications need to be designed and written for it. If not, all the money and creativity in the world cannot guarantee you anything. In fact they are even at best ugly and very expensive hacks to cheap and not highly available software that poses as “mission critical”.

Conclusion

Seriously people, business continuity can be a very costly and complex subject. You’ll need to think this through. When making assumptions realize that you cannot go forward without confirming them. We operate by the mantra “assumptions are the mother of al fuckups” which is nothing more than the age old “Trust but verify” in action. There are many things you can do for disaster recovery and business continuity. Do them with insight, know what you are getting into and maybe forget about doing it without one second of interruption for your entire business.

Let’s say disaster strikes and the primary data center is destroyed. If you can restart and get running again with only a limited amount of work and productivity lost, you’re doing very well. Being down for only a couple of hours or days or even a week, will make you one of the top performers. Really! Try to get there first before thinking about continuous availability via disaster avoidance and automatic autonomous failovers.

One approach to achieve this is what I call “Pandora’s Box”. If a company wants to have business continuity for its entire stack of operations you’ll have to leave that box closed and replicate it entirely to another site. When you’re hit with a major long lasting disaster you eat the down time and loss of a certain delta, fire up the entire box in another location. That way you can avoid trying to micro manage it’s content. You’ll fail at that anyway. For short term disasters you have to eat the downtime. Deciding when to fail over is a hard decision. Also don’t forget about the process in reverse order. That’s another part of the ball game.

It’s sad to see that more money is spend consulting & advisers daydreaming than on realistic planning and mitigation. If you want to know why this is allowed to happen there’s always my series on The do’s and don’ts when engaging consultants Part I and Part II. FYI, the last guru I saw brought into a shop was “convinced” he could open Pandora’s Box and remain in control. He has left the building by now and it wasn’t a pretty sight, but that’s another story.

BriForum Europe 2011 & The Experts Conference Europe 2011

Great news from the educational & conference front. First of all, I’m attending BriForum in London, United Kingdom in May (http://briforum.com/Europe/index.html).  That’s good news, normally we’d have to pop over the big pond to go to that one, so this is pretty neat. And timely, due to some prospecting I’m doing for Disaster Recovery,  Business continuity, application aware storage in a virtualized environment It’s a good match and I hope to get in to some educational discussions about the challenges we all face. Some of the storage vendors we’re interested in are there as well so there is certainly some potential to make it a good experience.

And just recently confirmed that The Experts Conference is coming to Europe. TEC2011 Europe will be held in Frankfurt, Germany from October 17th to October 19th 2011. This conference is high quality and created to fill the needs of the most experienced users, which is one of the reasons I would like to attend. The more you learn & grown the more you bump into the next level of challenges and being able to learn form high level content and interact with experienced speakers and attendees who are dealing with the same issues can be very rewarding. Attendees of TechEd have a way to measure the level of the sessions, well, they are all supposed to be Level 400 only. Quest is hosting this, so they certainly should be able to round up the expertise.  I’m going to make it to the new “track” at this conference and that’s “Virtualization & Cloud”. More information can be found here http://www.theexpertsconference.com/europe/2011/virtualization-cloud-training/overview/

The timing of these conferences is pretty good. As I said we’re doing a lot of prospecting right now and hope to get a lot of information from attending these. For anyone interested why I attend conferences and why I think they are valuable see mu blog post on this subject https://blog.workinghardinit.work/2010/06/05/why-i-find-value-in-a-conference/

Heavy Snowfall, Telecommuting & Office Situations Gone Wrong

Well, it’s winter again and we’ve had plenty of heavy snowfall disrupting traffic and the daily commute. Pictures of commuters pushing the busses they normally ride made the news head lines. No worries for me. I just work from home those days. That’s a good thing for a lot of reasons. You spend less time commuting in peak traffic, get more work done than at the office, you pollute less and you’re used to working remotely. That means you’ll be able to do it without any issues when there is a need like a flooded office floor, a flu pandemic or harsh weather conditions. Even better is that you can do this cheap and secure using the licenses you already have for Windows 2008 R2/Windows 7 with solutions like Remote Desktop Gateway  and Direct Access. Sounds pretty good right? So why aren’t more organizations using this to the fullest?

It is amazing how few companies really use telecommuting and remote access to their benefit. The culture shift it requires is usually to big of a hurdle. Yeah right. You often do see it happen on a larger scale when the company can close an office building due to telecommuting. That’s cold hard cash savings right there. Think of it. When there are huge monetary savings involved they’ll outsource work to the other end of the world and all objections disappear or “need to be dealt with”. So why is this? The lack of using telecommuting to the fullest of its potential is, to me, an indication of management failure. It’s so much easier to fool yourself into thinking your employees are productive when you can physically see them and they punch in and out. Whatever happened to that result driven organization? I know some employees need an office, not to have a place to work, but because of all the fun things you can do there. Gossip, small talk, lunches, meetings, etc. Some have bad home situations are are happier at the office or they bought an overly expensive house and the added heating cost would bankrupt them so they prefer to come in even with a 3 hour commute. Those people tend to confirm bosses in their beliefs that people need an office. Crappy managers also need control and rules. It’s a good indication on how your employers see you. Do they see you like irresponsible children who need to be protected against themselves, guarded and ruled by policies? Or do they think of you as responsible adults who’ll get the job done?

Now there is the  careerists view that you need to go to the office as well. You need to be seen working. The more people you can involve the more important your work looks and the longer it takes. Bad managers & bad companies like that for some weird reason. You see nothing is as annoying to a boss than employees who are to good and fast at their jobs. They tend to disturb the balance or come nag for input and direction which leads to having to manage. Ok, some managers like that as it serves their ego’s and confirms they are needed by their “sheeple”. Well in all honesty, when some of your employees get the job done in 20 hours per week that’s not their problem. It’s yours. You have way to many employees that are not up to the job at hand or you have a lot of fakers, who just fill time until they can punch out. And bad managers will never fail to “punish” the most productive employee by assigning them more work. Basically training them to do less and less because they figure out pretty quick productivity and speed doesn’t’ get them anything but more work for the same pay. That meritocracy that everyone seems to want, isn’t that result driven? But to recognize results you need to have a clue. But that’s another problem and I won’t go into that any further here. What do you think your boss like better. All projects completed and money made but you telecommuting 50% or a perfect office attendance record but project over time and over budget? Right.

To add insult to injury the office nowadays are landscape ones. Fields of blatting sheep, people interrupting each other all the time for input, telephone calls, vast amounts of senseless meetings with way to many participants and way to few decisions. They want to drag you in to every meeting as they think you have to execute all there ideas. No, you don’t. You already have a job and if the only thing they can produce is work for others to do there is something wrong. Employees should reduce the workload of the organization not increase it. Than two, three four colleagues who come over for a chat, eating another our of of your day. At one job it’s gotten that bad that when I had to spend a day at the office I tried to get home as soon as possible so I could start doing some work. Which I hated as bad office situations drain my energy. Some people are happy with activity and call it a day. But I’m very sorry, that doesn’t cut it. I need progress and results. A good taught provoking talk on this can be found here http://edition.cnn.com/2010/OPINION/12/05/fried.office.work/index.html, it’s by Jason Fried of http://37signals.com/.

I’ll give you some examples of my own. I recently agreed to go in to the office for one day during a holiday because of an urgent meeting. That meeting wasn’t urgent at all and everything could have waited another month that’s one but hey they taught it was cool and the boss was real happy. But the real silly thing is I sat down at my office to print the agenda for the meeting and within 15 minutes three people came by claiming my urgent assistance, attendance or time. One was indeed urgent but should have gone to the service desk. The other two where prime examples of a meeting culture run amok (yes with all those meetings people are rustling up attendees) and egoistic failure to plan. Really people, your failure to plan does not make it my emergency. Those are typically the people who run out early and can not ever stay late. Or need stuff urgently and when they get it go on leave for 3 to 4 weeks. As true spoiled children they act annoyed, let down and helpless when they are told I’m not there for them. “What to do now?"!”. Well the best answer is “grow up, get a grip and learn to plan ahead”.  That does not mean go moan to the boss, trust me on this one. If you’re entire existence is dependent on people being on stand by for every need that arises or them attending every meeting you hold you must be the commander of SAC/NORAD or something. Either that or you’re doing something very wrong.

Now offices can be useful if and when they are functional. A lot of office environments are far from that. They are toxic time wasters and that is such a shame. Offices can be smaller, more efficient and productive than they are now.  Augment that with flexible working and telecommuting and all the noise around commuter hell, missed deadlines, meetings, productivity and profit can go to garbage bin where they belong. Today, so many jobs do not need to be affected too much by weather, small disasters, pandemics but you will have to learn to work smarter and better. As a bonus you’ll have less tress, a cleaner environment and less traffic jams. What is not to like? So use the winter weather and all its problems with transport to rethink the way you work. You’ll be better of on the whole.