This article can also be found in the Premium Editorial Download "Storage magazine: Server virtualization strategies for storage managers."
Download it now to read this article plus other related content.
There’s been a lot of talk lately about how data deduplication is moving from backup to primary storage. Dedupe’s great for trimming primary data stores, but there are other technologies that can do the job.
Standard in many backup and archival products, data reduction is now becoming more prevalent for primary storage. The main drivers for this phenomenon are measurable cost savings from having to buy fewer disks and reducing annual support fees, to lowering operational expenses related to storage management. Data reduction may also have a pleasant impact on data storage performance: by not having inactive data occupy valuable high-performance storage, overall storage and application performance may get a welcome performance boost.
In a typical enterprise, according to Storage Networking Industry Association (SNIA) research, 80% of files stored on primary storage haven’t been accessed in the last 30 days; the same report asserts that inactive data grows at more than four times the rate of active data. With these facts in mind, it’s no surprise that data reduction techniques have been making their way into primary storage.
But in contrast to data reduction methods for backup and archiving, primary storage systems can’t tolerate even a little impact on performance and reliability, the two most relevant attributes of primary storage systems. As a result, data reduction techniques vary and have different relevance on primary storage than they
- Choosing the right RAID level
- Thin provisioning
- Efficient clones
- Automated storage tiering
Choosing the right RAID level
Putting “choosing the appropriate RAID level” at the top of a list of data reduction techniques may seem strange at first, but unlike other data reduction approaches, it’s the only option available on all storage systems and it greatly impacts disk requirements, performance and reliability. Were it not for its detrimental reliability shortcoming, RAID 0 (block-level striping across all disks without parity or mirroring) would be the most cost-efficient and best performing option, but losing the whole RAID group with the loss of a single drive makes it a no-go in the data center. RAID 1 (mirroring without parity or striping) and RAID 10 (mirrored drives in a striped set), on the other hand, combine good performance and high reliability but require twice the disk capacity and are therefore the antithesis of data reduction. RAID 5 (block-level striping with distributed parity) with its requirement for a single additional drive has been the best compromise in recent years, but as disks increased in size and rebuild times grew longer, the risk of losing two drives while the RAID is rebuilt after a drive failure has increased to an uncomfortable if not unacceptable level. As a result, storage vendors have been implementing RAID 6, which extends RAID 5 by adding an additional parity block and drive, enabling it to withstand two concurrent drive failures without data loss -- but it comes with a varying performance penalty, depending on implementation. RAID 6 and a RAID 6 performance benchmark should be on anyone’s evaluation list when shopping for a new storage system.
“Unlike most of our competitors, we can do RAID-DP [NetApp’s implementation of RAID 6] with only 5% overhead,” claimed Larry Freeman, senior storage technologist at NetApp.
This was first published in June 2011