Ezine

This article can also be found in the Premium Editorial Download "Storage magazine: Exploring systems that detect and repair hard disk problems automatically."

Download it now to read this article plus other related content.

Does end-to-end error correction work?

User evidence over the past 18 months suggests that HDD error-correction methods work. Interviews with IT organizations storing petabytes of storage (where silent data corruption is statistically more likely to be noticed) in mission-critical applications such as government labs, high-energy particle research, digital film/video production and delivery, seismic processing and so on, have revealed high levels of satisfaction. Perhaps the most telling remark came from an IT manager who wishes to remain anonymous: "I don't worry about silent data corruption anymore because it's no longer an issue for us."

Heal-in-place

Sector errors in traditional disk subsystem designs mark the HDD as failed. A failed HDD initiates a RAID data rebuild process that degrades performance and takes a long time. It can also be expensive, as there may still be useful life in the hard disk drive.

A heal-in-place system goes through a series of automated repair sequences designed to eliminate or reduce most of the "no failure found" HDD failures, as well as the subsequent unnecessary and costly RAID data rebuilds. As of now, there are five systems that provide heal-in-place capabilities: Atrato Inc.'s Velocity1000 (V1000), DataDirect Networks' S2A series, NEC's D-Series, Panasas' ActiveStor and Xiotech's Emprise 5000. Each provides a proven, albeit completely different, heal-in-place technology.

Atrato's V1000

Requires Free Membership to View

uses fault detection, isolation and recovery (FDIR) technology. FDIR continuously monitors component and system health, and couples it with self-diagnostics and autonomic self-healing. Atrato uses FDIR to correlate SATA drive performance with its extensive database of operational reliability testing (ORT) performed on more than 100,000 SATA hard disk drives. FDIR uses decision logic based on that extensive ORT history, stress testing and failure analysis to detect SATA HDD errors. It then leverages Atrato Virtualization Software (AVS) to deal with detected latent sector errors (non-recoverable sectors temporarily or permanently inaccessible). AVS' automated background drive maintenance commonly prevents many of these errors. When it doesn't, it remaps at a sector level using spare capacity on the virtual spare SATA HDDs. This enables many of those SATA HDDs with sector errors to avoid being forced into a full failure mode permanently, and allows those SATA hard disk drives to be restored to full performance.

DataDirect Networks' S2A's heal-in-place approach to disk failure attempts several levels of HDD recovery before a hard disk drive is removed from service. It begins by keeping a journal of all writes to each HDD showing behavior aberrations and then attempts recovery operations. When recovery operations succeed, only a small portion of the HDD requires rebuilding using the journaled information. Having less data to rebuild greatly reduces overall rebuild times and eliminates a service event.

NEC's D-Series Phoenix technology detects sector errors, but allows operation to continue with the other HDDs in the RAID group. If an alternative sector can be assigned, the hard disk drive is allowed to return to operation with the RAID group avoiding a complete rebuild. Phoenix technology maintains performance throughout the detection and repair process.

This was first published in June 2009

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: