Home > Storage Magazine > Tools, Trends & Analysis > Users Need Better Way to Predict Disk Failures
EMAIL THIS
Storage Magazine

  CURRENT ISSUE  

  FEATURES  

  TOOLS, TRENDS & ANALYSIS  

  COLUMNS  

  ARCHIVES  

  SUBSCRIBE/RENEW  
 

Users Need Better Way to Predict Disk Failures
Issue: Oct 2004
printer-friendly
All disk drive manufacturers happily publish their products' mean time between failure (MTBF) ratings, numbers like 500,000 hours for desktop-class drives, or 1 million hours for SCSI and Fibre Channel drives. But many users feel that the MTBF metric is largely irrelevant in a real-world storage environment.

MTBF "is basically a useless number" from a user's perspective, says Mike Chenery, vice president of advanced product engineering at Fujitsu Computer Products of America, a disk drive manufacturer. "What they really want to know is 'How many of my drives are going to fail?'"

Part of the problem with MTBF is that not everyone understands how it is calculated. An MTBF of 1 million hours, for example, does not mean that a drive will fail in 114 years (24*365*114), as some users may conclude. Rather, it's a statistical metric derived from testing of a large number of drives for a number of days, and determining the mean failure rate for the entire group, explains Aloke Guha, CTO at Copan Systems, a startup that makes a disk-based backup system built with low-cost ATA drives.

The issue is further complicated because some drive vendors "tend t...



o play games with the numbers," says Guha. For example, two drives may both have MTBFs of 500,000 hours, but one may be rated for a 24/7/365 duty cycle, but the other for an 8-hour duty cycle. "Read the fine print," he warns.

Unlike MTBF, another metric called annualized failure rate (AFR) will give you a sense of how often you can expect a drive to fail. AFR is calculated according to the number of drives that are returned to the manufacturer and deemed to indeed be defective. Enterprise drives tend to have AFRs of under 1%. Thus, assuming 1,000 enterprise drives, it's safe to assume that about 10 drives will fail per year.

In this day and age of very large disk drives, it's good to be able to predict your drive failure rates, says Dick Benton, senior consultant with Glasshouse Technologies in Framingham, MA. That's because the larger a disk drive in a RAID set, the longer it will take to rebuild, the longer you are exposed to another drive failure, or total data loss.

The problem with AFR is that vendors don't generally publish it. But Copan's Guha, for one, thinks that "the time may have come for users to demand 'What is the reported AFR?'"





TechTarget Storage Media
Storage Magazine View this month\\'s issue and subscribe today.
Storage Decisions Apply online for free conference admission.
SearchStorage.com
HomeNewsMagazineTopicsLearningMultimediaWhite PapersBlogsEventsAbout Us

About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
TechTarget provides technology professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective purchase decisions and managing their organizations' technology projects - with its network of technology-specific websites, events and online magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Site Map




All Rights Reserved, Copyright 2000 - 2009, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts