We are in the process of building a new data center and one of the requirements for this data center is that everything is configured to be highly available. In our attempt to get requirements from our business units, the question "what does highly available mean" has surfaced several times.
While attempting to define it to these business units, I have discovered that there are several meanings for this phase. I've tried to find an industry standard definition, but I have been able to locate one. Could you please help me define this phase that will encompass both a hardware and software definition?
The other concern is to what level of redundancy must we configure to meet the industry standard to say we are a highly available data center. If you are aware of an industry standard definition, please provide me a location as to where I can find it documented so I can refer my business units to it.
You ask an excellent question. Unfortunately, there isn't an easy answer for it. There is no really good standard definition of high availability. Anything that's higher than you have now is high availability.
In my opinion, the phrase has been corrupted by marketers who want to be able to say that their stuff is highly available while their competitors' stuff isn't. In fact, your question supports that opinion. Your management knows they want it, but they just don't know what it is.
Be that as it may, availability is not black and white. It is an ongoing, iterative process where you work with your customers (users) to determine their availability needs and then attempt to deliver a level of service that meets those needs. Do they need 7x24 service with zero downtime ever? Or, are outages acceptable if they are less than, say, 10 minutes? Or are outages acceptable in the middle of the night or on weekends? Are some systems more critical than others?
You need to work these answers out with your users and then you have to perform a very tricky balancing act. If you achieve too high a level of availability, then you'll spend too much money and nobody likes that. If you achieve too low a level, then you'll have spent money but not solved downtime problems.
Achieving HA is very much a business decision. How much does your downtime cost, and how much can your business afford to spend to protect your critical systems against that downtime? Only the people within your company know the answer to those questions.
I hate to plug my book in this forum, but it really does address many of these questions in a lot more detail than I can go into here. So, I do suggest that you check out Blueprints for High Availability: Designing Resilient Distributed Systems by Evan Marcus and Hal Stern (John Wiley & Sons, 2000).
Editor's note: Do you agree with this expert's response? If you have more to share, post it in one of our .bphAaR2qhqA^0@/searchstorage>discussion forums.
This was first published in December 2002