This article can also be found in the Premium Editorial Download "Storage magazine: R.I.P. RAID?."
Download it now to read this article plus other related content.
Self-healing storage: Xiotech Corp.'s Intelligent Storage Elements (ISE) is a good example of self-healing storage. ISE tightly integrates RAID and HDDs, and combines them into a single storage element.
Xiotech engineered ISE to resolve most RAID rebuild issues by eliminating 67% to 90% of the rebuilds. It starts by reducing HDD faults by proactively healing hard disk drives before a fault occurs using similar HDD reconditioning algorithms employed by the factory. It also uses advanced vibration controls and sealed systems called DataPacs to reduce outside influences from causing HDD faults. When a fault does occur, it reacts by providing remedial component repair within the sealed DataPac using methods similar to what the original manufacturer uses. It analyzes power cycles, recalibrates components, remanufactures the HDD, and migrates data when required to other sectors or HDDs. If the fault persists, ISE will isolate just the non-recoverable sectors and then initiate data reconstruction only for the faulted HDD sectors. So there are far fewer rebuilds and, when one is required, there's much less to reconstruct. In addition, it's all automated so no manual intervention to pull failed drives is required. The result is equivalent to a factory-remanufactured HDD with only the components that are beyond repair taken out of service. The downside to this transformational technology is that it has higher up-front costs, although it lowers the total cost of ownership
Atrato Inc.'s Velocity1000 (V1000) uses a self-healing technology called Fault Detection, Isolation Recovery (FDIR) in combination with Atrato's Virtualization Engine (AVE). FDIR watches component and system health, and adds self-diagnostics and autonomic self-healing, but it doesn't attempt to remanufacture or recondition HDDs in place as Xiotech does. Atrato puts 160 2.5-inch SATA drives in a 3U system called SAID (self-maintaining array of independent disks). Atrato uses its extensive SATA drive performance database of operational reliability testing (ORT) to monitor the installed drives actual performance to detect SATA HDD deviations. Atrato also deals with HDD faults by first attempting to repair the faulting HDD sectors (although not with manufacturer-level reconditioning, remanufacturing or component recalibration). If the fault or non-recoverable read error can't be repaired, the sector is isolated and only the affected data is reconstructed and remapped to virtual spare capacity. If a disk drive completely fails, it's reconstructed and remapped to the virtual spare capacity. Atrato reduces the number of rebuilds and rebuild times by reconstructing only affected data on virtual drives. Atrato backs its technology with a three-year warranty.
DataDirect Networks Inc.'s DDN S2A technology heal-in-place approach to disk failure attempts several levels of HDD recovery before a hard disk drive is removed from service. It begins keeping a journal of all writes to each HDD showing behavior aberrations and then attempts recovery operations. When recovery operations succeed, only a small portion of the HDD requires rebuilding using the journaled information so rebuild times are reduced and a service call may be avoided.
Panasas Inc.'s ActiveScan technology continuously monitors HDDs and their contents to detect problems. ActiveScan monitors data objects, RAID parity, disk media and the disk drive attributes. When a potential problem is detected, data is moved to spare blocks on the same disk. Future HDD failure is predicted through the use of HDD self-monitoring analysis and reporting technology (SMART) attribute statistical analysis, permitting action to be taken to protect data before a failure occurs. When an HDD failure is predicted, user-set policies facilitate preemptively migrating the data to other HDDs, which eliminates or mitigates the need for reconstruction.
LSI Corp. and NEC both detect HDD sector errors while allowing operations to continue with the other drives in the RAID group. If an alternative sector can be assigned, the HDD is allowed to return to operation, avoiding a complete rebuild. Performance is maintained throughout the detection and repair process. This is a limited self-healing technology that reduces the number of rebuilds and helps maintain performance.
3PAR's InSpire Architecture is engineered to sustain high performance levels by leveraging advanced HDD error isolation to reduce the amount of data that requires reconstruction, and by taking advantage of its massive parallelism to provide rapid rebuilds (typically fewer than 30 minutes). The system uses "chunklets" in their many-to-many drive relationships. That same massive parallelism allows 3PAR to isolate RAID sets across multiple drive chassis to minimize the risk of data loss if a chassis is lost.
This was first published in May 2010