This article can also be found in the Premium Editorial Download "Storage magazine: Exploring systems that detect and repair hard disk problems automatically."
Download it now to read this article plus other related content.
Panasas' ActiveScan feature continuously monitors data objects, RAID parity, disk media and the disk drive attributes. When it detects a potential problem with HDD blocks, the data is moved to spare blocks on the same disk. Future hard disk drive failure is predicted through the use of HDD SMART attribute statistical analysis, permitting action to be taken that protects data before a failure occurs. When a hard disk drive failure is predicted, user-set policies facilitate preemptively migrating the data to other HDDs. This eliminates or mitigates the need for reconstruction.
Xiotech's Emprise 5000, or ISE, is architected to proactively and re-actively provide autonomic self-healing storage. ISE preventive and remedial component repair takes place within its sealed DataPacs (storage capacity modules). It never requires manual intervention to pull failed drives. ISE provides in-place automatic data migration (when required), power cycling, factory remanufacturing and component re-calibration; only the surfaces of affected heads with allocated space, as opposed to entire disk drives, are rebuilt in very fast parallel processes. The result is the equivalent of a factory-remanufactured HDD, and the only components ever taken out of service are those that are beyond repair. Everything else is restored to full activity and performance.
Does autonomic self-healing work?
Based on interviews with users and on vendors' historical service data, autonomic
Fail-in-place is a fairly new concept aimed at resolving some prickly side effects of hot-plug or hot-swap HDDs in storage systems. An example of these difficult side effects include pulling the wrong drive and causing inadvertent data loss; delaying the replacement of a failed HDD, which defers rebuild starts and increases data loss risk; or using spare drives that may not have been recently tested, which may result in a second hard disk drive failure.
The basic concept of fail-in-place is to redefine and increase the smallest field-replaceable unit (FRU) from being a HDD to being a storage pack. A storage pack is a collection of hard disk drives operating in concert with a certain percentage of capacity allocated for sparing. HDD failures are automatically rebuilt from the allocated capacity. There are currently only two vendors supplying fail-in-place storage systems: Atrato (with its V1000) and Xiotech (with the Emprise 5000 or ISE). Both systems feature end-to-end error detection and correction, as well as autonomic self-healing.
Both vendors' product architectures are based on the concept of available user capacity being tightly coupled with enclosure lifecycle within a single FRU. An enclosure's lifecycle is the timeframe in which the enclosed raw capacity will be available to an application. The total enclosure capacity also includes an allowance for anticipated sparing requirements over the warranted capacity life of the enclosure (three years for Atrato and five years for Xiotech).
This was first published in June 2009