You can add data deduplication to your backup operations with an appliance, VTL or target array -- but backup application vendors offer software-only alternatives.
Backup redesign and data deduplication have remained hot in the storage market this year, even in the midst of our economic woes. But just offering data dedupe as an add-in feature isn't enough anymore; vendors are also coming to market with ever-finer differentiations as the technology becomes ubiquitous in backup operations.
In 2009, six companies -- Acronis Inc., Barracuda Networks Inc., CA, CommVault Systems Inc., IBM Corp. and Symantec Corp. -- have either added or expanded data deduplication in their backup software. CommVault threw down the gauntlet first with the announcement of data deduplication for Version 8 of its Simpana backup software in January. Simpana 8 also introduced a unique ability to dedupe data to tape.
CommVault's approach looks to cut down on the number of tapes it will take to restore an individual file by having users specify a predefined retention window. If that window is 30 days, the index with reference data for that 30-day period is stored on each tape so data can be restored from just one tape. At the end of the retention window, a new index is created.
CA's ARCserve Backup Version 12.5 offers users the option of "re-inflating" data before it's stored on tape or copying deduplicated full backups directly to tape. Incremental deduped backups like CommVault's aren't supported by ARCserve -- only a full backup, which must first be restored to the media server, can be copied to tape in its deduped form.
"We don't think that during a [tape] recovery of an incremental data set is the time to be rehydrating data," said Don Kleinschnitz, senior vice president of engineering at CA.
|Four routes to data dedupe|
Data deduplication can generally be implemented in a backup system in four different ways:
1. In-line appliance (e.g., Data Domain Inc.'s DDX product line, IBM Corp.'s Diligent ProtecTier)
2. Target backup disk array (e.g., ExaGrid Systems Inc.'s EX series, NEC Corp. of America's Hydrastor)
3. Virtual tape library (e.g., Sepaton Inc.'s S2100 series with DeltaStor, Quantum Corp.'s DXi-Series)
4. Backup application software (see "Backup apps with dedupe," next page)
Zahid Ilkal, CommVault's senior product manager, countered that most users don't use tape for operational restores. "In the extremely rare situation that we need to restore an individual file from tape, a longer recovery time is expected and tolerated by end users and is a minor tradeoff to make," he said. Users can also use a preview function in Simpana to look at files on tape without having to fully restore them. If a user is doing a full disaster recovery (DR) restore of all data, Ilkal said, "Our recovery technique is optimized in this case to restore all data from the front to back in the tape set without jumping across tapes for DR."
Acronis, Barracuda, IBM and Symantec don't currently offer dedupe on writes to tape.
|Backup apps with dedupe|
To date, six backup application vendors offer data deduplication options for their software suites:
Another emerging trend is a movement toward deduplicating data at the source, or the client server that hosts the application. CA ARCserve, CommVault Simpana and Symantec Veritas NetBackup PureDisk currently offer data deduplication at the backup server level, reducing network traffic between the backup server and the backup target, but not between the client and the backup server. NetBackup PureDisk has had the ability to deduplicate data at the source since before Symantec picked up the deduping IP when it acquired Data Center Technologies in 2005, but Symantec has only announced it will deduplicate data from the source when PureDisk melds with its NetBackup and Backup Exec apps over the next six months.
Acronis users have the option of deduplicating data from either the source or the backup server level. Barracuda's recent integration of BitLeap data dedupe IP (acquired in November 2008) with Yosemite's backup software app agents adds application-aware dedupe at the source level. IBM remains the outlier in this regard -- its dedupe is offered post-process, at the backup target.
Whether and how much vendors are charging for adding dedupe to existing backup software is another differentiator for some offerings; Acronis, CommVault and Symantec charge for the feature. CA and IBM customers get dedupe free of charge, while Barracuda claims Yosemite's $1,500 unlimited server backup license keeps its offering competitive.
Lauren Whitehouse, an analyst at Milford, Mass.-based Enterprise Strategy Group (ESG), said global deduplication is the next frontier for software and target device dedupe vendors alike. "Everybody's low on the maturity curve here," Whitehouse said. For CommVault, dedupe is global only within the same policy group (though CommVault argues this is sufficient for most customers), while dedupe is global only among system files for CA; application data is deduped separately within each backup server.
Arun Taneja, founder and consulting analyst at Hopkinton, Mass.-based Taneja Group, said these differentiators among backup products will ultimately be temporary, as data deduplication moves out of backup toward primary storage and application hosts. In five years, he predicted, "the benefits of deduplication on the primary storage side will flow right through to the back end -- there won't be this microscopic focus on the backup and archiving world."
BIO: Beth Pariseau is a senior news writer at SearchStorage.com.
- Demystifying storage performance metrics –ComputerWeekly.com
- Containers and storage 101: The fundamentals of container storage –ComputerWeekly.com
- All-Flash: The Essential Guide –ComputerWeekly.com
- Hybrid Flash: The Essential guide –ComputerWeekly.com