Manage Learn to apply best practices and optimize your operations.

Review: How much downtime is too much?

This column offers more on the components of the availability index. Evan Marcus describes the trade-off between cost of downtime and cost of uptime.

Over the last 11 columns, I've taken a look at the Availability Index, which is really just a fancy way of showing the trade-off between the cost of downtime and the cost of uptime. The more your downtime costs, the more technology you need to implement to protect your systems against that downtime, and the more that technology will cost.

Ultimately, availability is not a technology decision, but rather is a business decision. If the protection methods were free, then every user on every system would get 100% (or close to it) uptime through any and all calamities, and it wouldn't cost them anything for the privilege. Since uptime is far from free, the decision is much more complicated than that.

The first rule of availability is, of course, that downtime is inevitable. Hardware breaks and software fails. Always has, always will. The availability decisions affect how much downtime is caused by each failure. In general, the more you spend, the less downtime you'll experience.

The type of decision changes as we move up the index, as does, of course, the cost to implement each type of protection.

Toward the bottom of the index, the decision is whether or not the data we have is worth protecting when disks fail, or corruption is introduced, and the solution is backups.

Then the decision moves to speeding up recovery from inevitable failures, via disk mirrors or clustering. As we move up through the levels, we protect against larger scale outages by protecting larger components. First we start by protecting disks via mirroring, and then we move up to systems via clustering, and data centers via replication and disaster recovery.

Each new level of protection adds cost and complexity, which must be factored into the decision whether or not a particular protective measure makes sense.

Different systems require different levels of protection, based on their degree of business criticality. Development systems are rarely protected with the same amount of zeal or dollars as customer-facing systems.

Implementing insufficient protective measures will likely mean that your critical systems will be down more than an acceptable amount of time.

Over the last year's worth of tips, we have concentrated on the different technologies that enable enterprises to increase their level of system availability. But, like the computer system itself, the technology is just a tool that, if implemented properly, can deliver remarkable efficiencies and savings to the enterprise that has chosen to implement it.

In future tips we'll get back to more general topics in high availability. Is there something you'd like to see discussed here? Let me know.

View Evan's earlier columns about the Availability index.

Evan L. Marcus is the data availability maven at VERITAS Software. Ask him a question in the Ask the Experts area.

Dig Deeper on Data storage strategy

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.