This article can also be found in the Premium Editorial Download "Storage magazine: What to do when storage capacity keeps growing."
Download it now to read this article plus other related content.
Data deduplication and classification
Some CAS products use data deduplication, which breaks files apart, analyzes them at the block level and only stores identical blocks once to minimize the amount of data stored. HP's StorageWorks RISS and Permabit's Permeon Compliance Store include this as part of their software, but users need to turn it on.
NetApp introduced ASIS last March and EMC has announced a partnership with Avamar Technologies Inc. to provide similar functionality for Centera. HP says users will experience a three- to five-fold reduction in total storage using deduplication, but the technology will introduce some performance overhead. NetApp estimates that its filers will experience a 1% to 3% performance hit when ASIS is turned on.
CAS products classify data in several ways, using mostly meta data databases. As files are stored in RAIN architectures, meta data is extracted based on policies provided by the vendor and user. NetApp's filers index files after they're stored, although users can use any data classification engine to index, classify and tag data. NetApp's IS1200 appliance uses Kazeon Systems Inc.'s algorithms to deliver this functionality.
IBM's DR550 classifies data based on policies set previously with its TSM software. TSM then places the data on the correct tier of storage, moves the data to other tiers of storage when appropriate, and deletes the file at the end of its retention period. For this scenario to work,
A problem with all data classification approaches is the need to re-index data if requirements change. Depending on the size of the data store, re-indexing can be a performance-intensive exercise.
This was first published in June 2006