| Home > Self-healing storage explained | |
| Self-healing storage explained: |
|
||
That's right. Nothing. Drive manufacturers report that some 70% of the drives they get returned from makers of disk arrays have nothing wrong with them. Why? Because heat and vibration can cause intermittent errors in storage arrays, and the only remedy that array manufacturers have for these intermittent errors is to fail the drive.
Why does this matter to you, a storage administrator? The drive is under warranty, so why should you care? Here are three reasons why you should.
A storage system that itself could automatically resolve erroneous disk drive failures would save everyone time and money, and eliminate the introduction of unneeded risk into the storage environment. What can be done to make a system self-healing?
Modifying the drive enclosure Excessive drive vibration is caused by the way today's external arrays are put together. Drives are tightly packed into a single drive bay, then mounted on drive sleds for easy access and removal. This means the drives are all mounted, the disks are all spinning, and the heads are all seeking in the same direction. But all this results in excessive harmonic vibrations, which lead to enough read/write errors to presume a drive failure. These "failed" drives often end up working properly once they are sent back to the drive manufacturer. Vibration can cause the drive that is vibrating too much to fail. It can also cause neighboring drives to skip on reads or writes, hence the external controller will designate them as failed. This second issue is of real concern because it can cause a double drive failure by first failing a drive in an adjacent slot and then failing itself. Double drive failure on a RAID 5 system requires that data be restored from another source, such as tape. No rebuild is possible at this point. Drive makers can minimize vibration by rigidly packing the components so there is less movement from the spinning drives as well as designing the individual drive bay or housing so that it has the same rigidity throughout. Often in hot swap systems, the drive bay is looser in the front than the back, which amplifies vibrations for the front half of the drives. The only way manufacturers can significantly reduce drive vibration is to redesign the way their drive shelves are packed. There are two ways they can do this. First, the drives must counter-rotate (meaning they must be installed front to back), alternating throughout the array shelf. Doing so naturally dampens vibration and reduces or eliminates enclosure torque. Two companies that counter-mount their drives are Xiotech and Copan Systems. The second step is to build a better drive shelf and drive sled system that provides more consistent rigidity so the drives cannot vibrate. The combination of these two techniques can reduce vibration significantly.
Minimize heat buildup
The easiest step in creating a self-healing array is to power-cycle the drive (akin to rebooting a desktop workstation), which usually fixes the problem. In the case of a self-healing drive system, the first attempt to repair a drive that is showing signs of failure is to automatically reset or power-cycle the drive in a manner that has little or no impact on normal operations. The key is to have the whole process performed within the application time-out thresholds, using cache to manage I/Os during the recovery. Once the drive comes back on, it is tested to see if it is operating normally. If so, it is returned to service. This can all be made to happen without user intervention. Most of the time a simple reset or power cycle will fix the problem. While most array systems and controllers cannot do this, companies like Xiotech are leading the charge.
Process of remanufacturing
A drive enclosure that reduces heat and vibration, combined with drive remanufacturing capabilities, should eliminate most drive failures. But drive failure can still occur in even the most drive-friendly environments. If a drive does eventually fail, the next logical step is to fail smart. The three aspects of failing smart include:
About the author: George Crump is founder of Storage Switzerland, an analyst firm focused on the virtualization and storage marketplaces. It provides strategic consulting and analysis to storage users, suppliers, and integrators. An industry veteran of more than 25 years, Crump has held engineering and executive management positions at various IT industry manufacturers and integrators. Prior to Storage Switzerland, he was CTO at one of the nation's largest integrators.
'); // -->
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||
|
||||||||||