A NEW CATEGORY of data-classification and management tools is emerging, exemplified by companies like Abrevity, Index Engines, Kazeon Systems and Njini. Often marketed as tools to assist with information control and security, they may also have implications for your storage and backup operations.
Likened to information lifecycle management's (ILM's) "missing brain" by Brad O'Neill, senior analyst and consultant at the Taneja Group, Hopkinton, MA, information classification management (ICM) tools help storage administrators determine the content of their data so they can effectively decide whether it can be moved to a lower tier of storage. So far, most of the customer traction around ICM has been for tiering and ILM purposes, says O'Neill.
After doing a data classification with Network Appliance's (NetApp's) Information Server 1200, which the company OEMs from Kazeon, most customers find that between 25% to 50% of the data doesn't need to be on their high-end tiers, says Manish Goel, NetApp's VP and general manager, data protection and retention solutions.
So far, NetApp has stopped short of automating the data movement between tiers, citing customer reluctance. "There's not a whole lot of interest in moving data dynamically between expensive and cheap storage," says Goel. "It brings back scary memories of HSM [hierarchical storage management]."
More often than not, data classification not only finds files to move onto a lower tier of disk, but it also unearths files that can be out-and-out deleted. That's what happened recently at Virgin Mobile UK. Using Njini's NjiniIAM to perform a data classification, Virgin was surprised to find that 22% of its files stored on two NetApp 940 filers had duplicates. "We were conscious that there may have been some file duplication ... but on the whole, we were expecting a duplication of less than 10%," says Keith Bennett, lead infrastructure architect at Virgin.
Purchasing the Njini software was a simple decision. The way Bennett figures it, the cost of the software is easily offset by reusing the capacity freed up from deleting duplicate files.
By definition, de-duplicating files promises a space reduction of at least 50% but possibly more, points out Eric Madison, director of marketing at Abrevity. "It depends on how many duplicates a file has."
Finding redundant data "is one of the pieces that falls out of doing a data classification," says O'Neill. "It's an interesting selling point for ICM," he adds, although in his opinion, "not the primary driver."
One novel use for data-classification software comes by way of NetApp, which last month announced SnapSearch and Recovery. Using the Kazeon software, SnapSearch indexes and classifies the contents of Snap-based backup archives. That way, end users can easily search the contents of their backups, and retrieve lost or archived data themselves.
The thinking behind SnapSearch and Recovery is simple enough. "What's the point of doing a backup if you can't do a restore?" asks Goel. (See the related feature "Finding data".)