Manage Learn to apply best practices and optimize your operations.

# The myth of the nines

## The nines of reliability is a theory that refers to the percentage of time a system is running. But the theory's flaw is that it doesn't consider the timing of a system outage.

When users, and especially management, discuss the details of availability and of high availability, one of the...

first things that comes up is "the nines." In availability circles, everyone talks about the nines of availability, as shown in the table below.

Nines Percentage Downtime in a year
2 99% 3.65 days
3 99.9% 8.75 hours
4 99.99% 52 minutes
5 99.999% 5 minutes
6 99.9999% 31 seconds

The nines specifically refer to the percentage of time that a system is up over a given period of time, usually a year. It's a nice, easy method of summing up availability, and evaluating system and system administrator quality, within a model that even the pointiest-haired boss can understand.

The nines model makes a fundamental and invalid assumption. It assumes that all time is worth exactly the same amount to the organization that has deployed the critical system. That's simply not true.

Consider amazon.com, for example. Would two 26-minute outages on their external Web site (giving them 99.99% availability, if they were the only outages in a year) hurt them the same amount if one occurred on a Friday night in December, while the other occurred at 3am on a Sunday morning in late August?

If the systems that control the cameras and graphics in the network news studio at NBC failed, would it hurt more if the failure occurred just before the live news broadcast, or just after it was over?

The nines model does not take timing into account.

The more components a system has, the more complex it is. Consider a system with ten components, and an availability goal of five nines (99.999%, or five minutes downtime a year). What that really means is that each component is allocated an average of 30 seconds a year of downtime that it can be responsible for. If any one component is responsible for more than its 30 seconds, another component must be responsible for less. If any one component is responsible for more than five minutes of downtime, then it doesn't matter what the other components do, the goal has been exceeded.

In the last few years, many system vendors have begun to offer contractual uptime guarantees, where if a system's downtime exceeds a given threshold, the vendor will pay money to the end user, as compensation. The problem with this model is that there are many causes of downtime that are outside the domain of the system vendors. Electric power is one example. If HP guarantees that your system will be up 99.999% of the time, but you suffer a power outage for two hours (your UPSs only held out for an hour), the uptime guarantee should kick in. But it's not HP's fault that you had a power outage, and in fact, their contracts specifically exclude certain types of outages. By the time these reasonable exclusions are accounted for, these contractual agreements have lost most of their teeth.

My advice on 9s is to measure them. Keep availability statistics. Report them if they are good, or if you are called upon to do so. Even if your management has not called on you to report availability statistics, record them on a regular basis. Then look at what causes the majority of your downtime, and fix it. If you concentrate on the majority of the problems, you will see a significant improvement in availability.

Rather than getting buried in the details of the numbers, concern yourself with basic improvement. Trend upwards.

For more information on availability, view more tips by Evan L. Marcus.

Evan L. Marcus is Data Availability Maven at VERITAS Software and the author of Blueprints for High Availability. You can contact him at evan@veritas.com.

This was last published in August 2003

## Content

Find more PRO+ content and other member only offers, here.

#### Start the conversation

Send me notifications when other members comment.

## SearchDisasterRecovery

• ### Cloud data recovery challenges and how to overcome them

For an organization with a short recovery time objective, what makes a good cloud DR strategy? Availability, along with strong ...

• ### Ransomware disaster recovery for SMBs: A step-by-step guide

SMBs are especially at risk of cyberattacks. Follow these steps, and even with a limited budget and staff, you could successfully...

• ### Zerto Virtual Replication dives deeper into multiple clouds

Zerto has added to its bidirectional Virtual Replication software with multi-cloud protection for data and applications stored in...

## SearchDataBackup

• ### Cohesity DataPlatform now spans file, object storage

Cohesity's DataPlatform tackles unstructured data through SpanFS, which manages file and object storage and expands the vendor's ...

• ### Veritas CloudPoint aims for multi-cloud data protection

CloudPoint 2.0 now includes built-in content indexing from snapshots, application persistent snapshots and auto discovery of new ...

• ### Latest Carbonite acquisition adds Mozy backup

Carbonite stays busy with its pending acquisition of Mozy and a forthcoming DRaaS product. The data protection vendor also ...

## SearchConvergedInfrastructure

• ### IT infrastructure planning key for converged, hyper-converged platforms

Whether you're planning to deploy CI or HCI, infrastructure planning is critical to the project's success. Without proper ...

• ### With VxBlock 1000, Dell EMC AMPs up converged platform

Dell EMC VxBlock 1000 lets customers mix and match VMAX, Unity, XtremIO and Isilon arrays in a block with Cisco servers and ...

Close