What you'll learn: Find out how Atrato, DataDirect Networks, Panasas and Xiotech use RAID technology in self-healing...
systems and also deal with hard disk drive (HDD) failures.
Self-healing systems from a handful of data storage vendors come with the promise of little or no maintenance, and seek to proactively heal hard disk drives (HDDs) after a hard disk drive failure. But depending on which self-healing system you examine, you'll find different approaches to data migration and recovery, as well as disk repair. In this expert tip, Mark Staimer highlights these various approaches to using RAID technology in self-healing storage systems.
Atrato Inc.'s Velocity1000 (V1000) uses a self-healing technology called Fault Detection, Isolation Recovery (FDIR) in combination with Atrato's Virtualization Engine (AVE). FDIR watches component and system health, and adds self-diagnostics and autonomic self-healing, but it doesn't attempt to remanufacture or recondition HDDs in place as Xiotech does. Atrato puts 160 2.5-inch SATA drives in a 3U system called SAID (self-maintaining array of independent disks). Atrato uses its extensive SATA drive performance database of operational reliability testing (ORT) to monitor the installed drives actual performance to detect SATA HDD deviations. Atrato also deals with HDD faults by first attempting to repair the faulting HDD sectors (although not with manufacturer-level reconditioning, remanufacturing or component recalibration). If the fault or non-recoverable read error can't be repaired, the sector is isolated and only the affected data is reconstructed and remapped to virtual spare capacity. If a disk drive completely fails, it's reconstructed and remapped to the virtual spare capacity. Atrato reduces the number of rebuilds and rebuild times by reconstructing only affected data on virtual drives. Atrato backs its technology with a three-year warranty.
DataDirect Networks' S2A
DataDirect Networks Inc.'s S2A technology heal-in-place approach to disk failure attempts several levels of HDD recovery before a hard disk drive is removed from service. It begins keeping a journal of all writes to each HDD showing behavior aberrations and then attempts recovery operations. When recovery operations succeed, only a small portion of the HDD requires rebuilding using the journaled information so rebuild times are reduced and a service call may be avoided.
Panasas Inc.'s ActiveScan technology continuously monitors HDDs and their contents to detect problems. ActiveScan monitors data objects, RAID parity, disk media and the disk drive attributes. When a potential problem is detected, data is moved to spare blocks on the same disk. Future HDD failure is predicted through the use of HDD self-monitoring analysis and reporting technology (Smart) attribute statistical analysis, permitting action to be taken to protect data before a failure occurs. When an HDD failure is predicted, user-set policies facilitate pre-emptively migrating the data to other HDDs, which eliminates or mitigates the need for reconstruction.
Xiotech engineered ISE to resolve most RAID rebuild issues by eliminating 67% to 90% of the rebuilds. It starts by reducing HDD faults by proactively healing hard disk drives before a fault occurs using similar HDD reconditioning algorithms employed by the factory. It also uses advanced vibration controls and sealed systems called DataPacs to reduce outside influences from causing HDD faults. When a fault does occur, it reacts by providing remedial component repair within the sealed DataPac using methods similar to what the original manufacturer uses. It analyzes power cycles, recalibrates components, remanufactures the HDD, and migrates data when required to other sectors or HDDs. If the fault persists, ISE will isolate just the non-recoverable sectors and then initiate data reconstruction only for the faulted HDD sectors. So there are far fewer rebuilds and, when one is required, there's much less to reconstruct. In addition, it's all automated so no manual intervention to pull failed drives is required. The result is equivalent to a factory-remanufactured HDD with only the components that are beyond repair taken out of service. The downside to this transformational technology is that it has higher up-front costs, although it lowers the total cost of ownership (Xiotech provides a five-year warranty).
Other vendor approaches
LSI Corp. and NEC Corp. both detect HDD sector errors while allowing operations to continue with the other drives in the RAID group. If an alternative sector can be assigned, the HDD is allowed to return to operation, avoiding a complete rebuild. Performance is maintained throughout the detection and repair process. This is a limited self-healing technology that reduces the number of rebuilds and helps maintain performance.
3PAR's InSpire Architecture is engineered to sustain high performance levels by leveraging advanced HDD error isolation to reduce the amount of data that requires reconstruction, and taking advantage of its massive parallelism to provide rapid rebuilds (typically fewer than 30 minutes). The system uses "chunklets" in its many-to-many drive relationships. That same massive parallelism allows 3PAR to isolate RAID sets across a multiple drive chassis to minimize the risk of data loss if a chassis is lost.
BIO: Marc Staimer is a frequent contributor to TechTarget.
This article originally appeared in Storage magazine.