This article can also be found in the Premium Editorial Download "Storage magazine: Choosing the best disaster recovery planning tool."
Download it now to read this article plus other related content.
New data protection schemes
Storage vendors are finally beginning to understand that it's not about protecting disks but protecting information, and their data protection schemes are evolving to reflect this. There are some novel approaches in the market to solving the problems produced by large, slow drives. Some technologies reduce the overall number of rebuilds a system performs. Some have shifted to information-based protection schemes in which, rather than mirroring a disk, they mirror information (files, chunks or objects). Some even do a little of each. So how does this impact rebuild times? When you think in terms of rebuilding information rather than a single disk, you can put the power of the system architecture to work, leveraging the massive parallelism opportunity presented by multidisk architectures.
There are several technologies in the market today that reduce the overall number of drive failures, and thus the number of rebuilds required. In some instances, vendors take unresponsive drives offline to diagnose problems and return them to service if no trouble is found. This is a great approach, as it eliminates the need to perform a full rebuild. When the drive goes offline, the system journals all writes that would have gone to that drive while attempting to recover the drive. After a successful recovery, only the data in the journal is required to be rebuilt, not the entire disk.
Some vendors have a two-pronged approach
This also speeds rebuilds by putting its grid architecture to work. Most grid-based architectures have capacity or storage nodes and separate processor nodes. Typically, all processor nodes can access all capacity nodes. When data is written, it's broken into a number of fragments. These fragments are then distributed across as many storage nodes as are in the system. Using a default of nine data fragments and three parity fragments (the exact number of parity fragments is user configurable), each of 12 storage nodes would get a fragment. If there are four storage nodes (the minimum configuration), each node gets three fragments. In the event of a drive failure, the data from that drive is rebuilt, just like in conventional hardware RAID. But unlike conventional RAID, data isn't rebuilt to a single drive; the data is redistributed across the storage nodes leveraging any available storage capacity. If an entire storage node fails, the data from those drives is rebuilt across the remaining storage nodes. We've seen this type of technology implemented for both parity-protected data and mirrored data. Thanks to protecting data rather than disk drives, as well as the power of a grid architecture, rebuilds happen in a fraction of the time it would take for a conventional drive rebuild. It's the information that's being rebuilt, not the exact drive layout.
This was first published in January 2009