The fallacy of High & Continuous Availability without a Vision – Cloud to the rescue?

A lot of people today are obsessed with uptime, high availability or even continuous availability without having a clue about what it is and why or when to use it. Sometimes rightly so, as some systems must be up and running as much as humanly possible. But often this is not necessary. Sometimes it’s even used to fix issues it can’t fix, at best only mitigate. An example of this is using software that has some very bad design issues. An example is software that parses vast amounts of data 24/7 and that, if a network connection or a database connection is unavailable for a short time, loses all the work already done. So they need to start over, parsing data again for many days. The GIS/CAD world is rifled with this kind of custom build crap software. Investing a lot in making the database or network more redundant is cost prohibitive, it doesn’t happen all that much that these fail and it doesn’t address the real issue, the bad software “design”. Other examples are software that renders services 24/7 and that’s designed to run interactively. This is so bad in so many ways I won’t even begin to address all the issues with automation, security, usability, stability and availability this causes. I only bring it up because sometime people ask for an IT infrastructure fix to these problems.

Sometimes the services can be made highly available but it is not profitable to do so. Always make a cost versus benefits analysis before deciding to putting down your money. I know that nowadays people are becoming more and more demanding as everyone seems to be on line 24/7 and expect services to always run. They even do so when these services are free like Hotmail/Gmail, twitter, instant messaging and various social media. People are becoming more and more dependent on them just like they are on electricity and water, and just like such services they demand ‘m at low ball prices. Yes the same people that balk at the price of a cubic meter of drinkable water (a resource we’ll go at war over in the future I’m afraid) and who will happily put down 750 € for a Smartphone. Cloud will make us consume very valuable resources at low prices and we will forget what they mean to us. Pure consumption … nope, the cloud will not be green I’m afraid. We’re are spoiled rotten and in the future will be even more so.

Now before we think that the always on Walhalla will be achieved by cloud computing I’ll make some reservations about that subject or at least temper your enthusiasm. Utilities like water and electricity are only high available because they are very standardized and highly controlled. You can get what you can get and that’s it. A lot of our IT is way too specialized to reach that level of service at commodity costs. We’re only at the very beginning of that evolution in IT. So for your specialized IT needs be realistic. Does it matter the database is down for maintenance between 02:00-04:00 (rebuilding indexes)? Does it matter that the intranet server with the company mess ordering site and the holiday request form is being updated at night. That the switch is being reconfigured or gets its firmware updated at night? In a lot of cases it just doesn’t matter and causes no issues what so ever with decent software solutions. Also think about less frequent issues like a server being down due to motherboard failure. So you are down for 24 hours? Is that bad? It depends on your needs, what service and who needs it. But face the fact that we’re not all running a nuclear power plant, a hospital, the emergency services communication network or the air traffic control system. Do you need to operate in such a critical endeavor to try and improve availability? No, if you can get high availability cheaply, why not. At that moment the cost /benefit balance tips in your favor. Just look at clustering today versus 10 years ago.

Take a long hard look at a couple of considerations before deciding to invest in high availability.

  • Do you really need it or do you have processes and software that are of such “questionable” quality that it fails to deliver unless the universe in which the software runs is perfect? Do you think you need it because it sounds professional and perhaps you think it will help you be more productive?
  • Do you realize most business systems do not require 24/7 uptime? A lot of their stuff can be down for even a days with only a small impact on the business. Does this happen a lot? This depends on a lot of factors but most of the time it doesn’t no. Can and will it happen? Oh yes. Everything breaks. Everything, only sales people, idiots and complete raving mad lunatics think that it can’t. Don’t be offended but apart from properly set up redundant systems completely failing the biggest factor is human inadequacies. One big Bio Carbon Unit error and major downtime materializes.
  • If some businesses need it they’ll have to accept it’s going to cost them a lot. They’ll spend a lot of time, money, and Bio Carbon Units on it – continually. It’s a never ending effort. Yes High availability has become a lot more affordable, but in comparison “normal” systems have become so cheap there is still a big cost gap! And the human skill set and effort required comes at a cost. A big one.
  • Do high availability right or you’ll pay for it in more problems than you had before. Instead of improving your “not so perfect” operations you’ve just flushed it availability down the drain. Yes, you’ll be worse off than before you had high availability gear in place. Stuff breaks. Unbreakable does not exist. And broken high available stuff is harder to troubleshoot than “ordinary” stuff.
  • Beware of people in charge who have no competencies about what they are in charge about. No one likes to come over as incompetent so they buy stuff and hire people to take care of that. A lot of the time that doesn’t work and costs a bundle. They buy into the commercials and by equipment thinking it will deliver high availability out of the box, like the vendor said. People in charge with no context and knowledge combined with salesmen without scruples seldom deliver results.

Never underestimate how lucky you are if you have dedicated and skilled personnel to keep your high availability systems running. The amount of effort, time and money needed to be able to react to problems are tremendous. It’s a serious investment due to the nature of high availability and complexity involved. It has been said before; and by many people: complexity is the enemy of availability. You should only insert complexity when you know you can manage it and when the benefits outweigh the investment and costs it incurs. Fail to do so and you will pay dearly by actually reducing your availability.

There are times that you need realistic high availability. When you virtualized all your systems and you did that on one single point of failure you’re not daring Murphy, you’re requesting him to come over and let the full weight of his law come down on your business. But even then do so with reason. When a continuous availability systems drains your monetary and human resources without ever living up to its promises you’re in a very bad place. You will be a lot cheaper and better off with a failover system that gives you solid performance when need, even when it means 30 minutes of downtime. Remember that you can’t control everything. Spending a million € on continuous availability when (external) factors out of your control bring the entire process down for one day two times a year causing 50.000 Euro’s in damages is silly. Accept 4 days of down time a year and eat the 100.000 Euro’s. Perhaps a 100.000 € investment in a solution that lasts for 4 to 5 years can reduce the yearly loss to 50.000 € and is the wiser choice. As always, it depends.

2 thoughts on “The fallacy of High & Continuous Availability without a Vision – Cloud to the rescue?

  1. There are a lot of good points made here, adding unneeded complexity to an IT environment in order to prevent downtime can often lead to unintended costs and consequences.

    However, there are ways to achieve continuous availability without the complexity of a cluster or cloud.

    DISCLAIMER: I’m a hardware engineer for Stratus Technologies.

    The ftServer line of servers is fully capable of meeting continuous availability demands while running an industry standard OS. Specifically, ftServer supports Windows, Linux, and VMWare running on Intel Xeon processors. To the outside world, an ftServer is a 4U rack mounted dual socket machine. However, under the hood, it houses a fully redundant architecture that allows the system to survive virtually any single component failure, from uncorrectable memory errors, machine check errors, or general component failure. This entire process happens under the OS, with zero interruption to the user’s applications.

    We actually post our uptime right on our front page. (WWW.Stratus.com) As of this post, we have 99.99987% availability across our field population.

    There is always a cost for continuous availability, and Stratus is no different, being that our Server will cost more than a standard whitebox with similar specs. However, the true cost, in complexity and personnel, an ftServer is no different to manage than a single server, yet offers advantages a cluster or cloud cannot match].

    • @ Eitan Novotny

      It depends on the situation, the application and the needs at hand. For a large database with better scale up than scale out possibilities it’s a very viable option. For other scenario’s one might get better results with a mass of 100.000 cheap servers running distributed web apps than can survice 100 of serves going down. The nature of the software and the application usage scenario allow for it. There is a mixed bag of possibilities and solutions that allow to come up with the best design for a given situation and budget. The main thing is to make that analyses and to know what & why the choice is made together with an understanding of the needs and impact it will have on operations, budget, personel etc.

Leave a Reply, get the discussion going, share and learn with your peers.

This site uses Akismet to reduce spam. Learn how your comment data is processed.