# The myth of the nines

## The nines of reliability is a theory that refers to the percentage of time a system is running. But the theory's flaw is that it doesn't consider the timing of a system outage.

When users, and especially management, discuss the details of availability and of high availability, one of the...

first things that comes up is "the nines." In availability circles, everyone talks about the nines of availability, as shown in the table below.

Nines Percentage Downtime in a year
2 99% 3.65 days
3 99.9% 8.75 hours
4 99.99% 52 minutes
5 99.999% 5 minutes
6 99.9999% 31 seconds

The nines specifically refer to the percentage of time that a system is up over a given period of time, usually a year. It's a nice, easy method of summing up availability, and evaluating system and system administrator quality, within a model that even the pointiest-haired boss can understand.

The nines model makes a fundamental and invalid assumption. It assumes that all time is worth exactly the same amount to the organization that has deployed the critical system. That's simply not true.

Consider amazon.com, for example. Would two 26-minute outages on their external Web site (giving them 99.99% availability, if they were the only outages in a year) hurt them the same amount if one occurred on a Friday night in December, while the other occurred at 3am on a Sunday morning in late August?

If the systems that control the cameras and graphics in the network news studio at NBC failed, would it hurt more if the failure occurred just before the live news broadcast, or just after it was over?

The nines model does not take timing into account.

The more components a system has, the more complex it is. Consider a system with ten components, and an availability goal of five nines (99.999%, or five minutes downtime a year). What that really means is that each component is allocated an average of 30 seconds a year of downtime that it can be responsible for. If any one component is responsible for more than its 30 seconds, another component must be responsible for less. If any one component is responsible for more than five minutes of downtime, then it doesn't matter what the other components do, the goal has been exceeded.

In the last few years, many system vendors have begun to offer contractual uptime guarantees, where if a system's downtime exceeds a given threshold, the vendor will pay money to the end user, as compensation. The problem with this model is that there are many causes of downtime that are outside the domain of the system vendors. Electric power is one example. If HP guarantees that your system will be up 99.999% of the time, but you suffer a power outage for two hours (your UPSs only held out for an hour), the uptime guarantee should kick in. But it's not HP's fault that you had a power outage, and in fact, their contracts specifically exclude certain types of outages. By the time these reasonable exclusions are accounted for, these contractual agreements have lost most of their teeth.

My advice on 9s is to measure them. Keep availability statistics. Report them if they are good, or if you are called upon to do so. Even if your management has not called on you to report availability statistics, record them on a regular basis. Then look at what causes the majority of your downtime, and fix it. If you concentrate on the majority of the problems, you will see a significant improvement in availability.

Rather than getting buried in the details of the numbers, concern yourself with basic improvement. Trend upwards.

For more information on availability, view more tips by Evan L. Marcus.

Evan L. Marcus is Data Availability Maven at VERITAS Software and the author of Blueprints for High Availability. You can contact him at evan@veritas.com.

This was last published in August 2003

## Content

Find more PRO+ content and other member only offers, here.

#### Start the conversation

Send me notifications when other members comment.

## SearchSolidStateStorage

• ### Will the eMMC controller market keep up with flash innovation?

EMMC host controllers may have a hard time handling advances in flash memory technology, like 3D NAND and newer connection ...

• ### Small but mighty eMMC flash storage grows its enterprise role

Many common devices, like your cell phone and tablet, use eMMC flash for storage. But the internet of things will soon make eMMC ...

• ### How eMMC 5.0 can improve your organization's small storage needs

The latest eMMC specification puts the tiny flash storage devices on a level playing field with many SSDs when it comes to speed ...

## SearchConvergedInfrastructure

• ### Holy COW! New Hampshire med center turns to Pivot3 vSTAC for VDI

Southern New Hampshire Medical Center put its traditional server-storage architecture out to pasture when it added ...

• ### Examining the state of the hyper-converged infrastructure market

HCI market leaders have emerged, but some question how long they'll retain their hold over the rapidly evolving segment.

• ### Nutanix networking management includes microsegmentation, APIs

Nutanix adds 'one-click networks' to its hyper-convergence as part of its plans to become an on-premises version of Amazon Web ...

## SearchCloudStorage

• ### Hitachi Content Intelligence searches, analyzes data

Hitachi Content Intelligence, built into Hitachi Content Portfolio object storage, extracts data and metadata from repositories ...

• ### OpenStack Newton storage features include data encryption

Storage updates in OpenStack's Newton release include at-rest data encryption in Swift, a message API for async tasks in Cinder ...

Google Cloud Platform expands Zadara Storage VPSA and ZIOS hyperscale cloud SaaS options, which already support Amazon Web ...

## SearchDisasterRecovery

• ### Case closed: Law firm selects iland DRaaS for faster, easier DR

Minutes count in legal work, and Graubard Miller needed a simpler platform for disaster recovery. The verdict: The law firm chose...

• ### Disaster recovery and business continuity plans require updating

Updating business continuity and disaster recovery plans can seem daunting, but it becomes easier when you delegate tasks and ...

• ### Enhance cloud resiliency with proper data management

Explore factors that can influence your level of cloud resilience, such as outages in different geographic locations, and the ...

## SearchDataBackup

• ### Mobile data backup helped by encryption, data policies

More and more corporate data is being created and living on mobile devices such as tablets and smartphones. That dynamic requires...

• ### Veeam backup software protects mental health facility's Hyper-V

A mental health and addiction facility had an ongoing problem with its virtual machine backup and recovery until it was solved ...

• ### Commvault backup software builds in cloud capability, virtualization

By embracing new technology, like the cloud and virtualization, Commvault Systems provides businesses with a complete backup ...

Close