Data deduplication has the potential to radically change the economics of storage. By storing only unique backup data and maintaining meta data about the duplicates, actual stored data can be reduced by magnitudes, resulting in huge cost savings. That said, early deployments are pushing the boundaries of these products, but not all of them are living up to the hype.
By using EMC's Avamar deduplication software, chip giant Qualcomm has cut its backup window from two hours to 15 minutes. That's impressive; but with approximately 50 servers and 6TB to 7TB of stored data, the company's storage environment is fairly small, and Qualcomm isn't convinced the technology will scale the way it would like.
"It works fine for 50 to 100 servers, but for 1,000 servers it needs to be more scalable," says Paul Ferraro, storage manager at Qualcomm. The company backs up 5,000 servers, for which Ferraro says Avamar's point-and-click GUI management interface isn't appropriate. "The management is for a midsize shop in the way it's administered," he says. "We need a command line to be able to build it into our whole environment." Furthermore, Ferraro says he was told that the technology isn't designed to keep 10 years' worth of data. "That's a problem, as we'll have to find new ways to archive off to tape," he says.
Another user concerned about dedupe's scalability is Todd Rolfson, storage services engineer at the University of Minnesota, who just ordered three Data Domain DD560s to replace three DD460s. "We knew we would bury them quickly in terms of the capacity limits," says Rolfson. The DD560 boxes are 15TB apiece vs. 5TB for the DD460. Data Domain claims 120MB/sec throughput, which Rolfson says he'll be watching closely. "I'm concerned about pinch points for performance," he says. "But Data Domain is looking at multiheaded systems that would have more NICs [network interface cards] and processors to scale performance separately to capacity."
Friends Provident, a large financial services company in the U.K., is also pushing the scalability limits of deduplication technology. Martin Bruce, the company's lead storage consultant, has deployed Diligent Technologies' ProtecTier software. Unlike most other deduplication products, ProtecTier performs the deduplication factoring inline while the backup is performed. But because Friends Provident performs daily incremental backups of approximately 8TB to 10TB rather than full backups, it's not taxing the dedupe engine too much. "We're not seeing the backup streams running any slower ... ProtecTier is keeping up with all the clients," says Bruce.
Bruce says he's seeing approximately a 9:1 deduplication ratio and speeds of approximately 200MB/sec vs. the 25:1 deduplication and 220MB/sec throughput speeds Diligent claims in its marketing material. Still, Friends Provident is storing approximately 45TB worth of data in 5.8TB of actual disk space and is more than happy with its results. Bruce expects Diligent to add clustering to the product, and he plans to deploy that capability as soon as it's released. "That will double the throughput to ensure the scalability," he says.
The bottom line is that duplicate files--along with the space and bandwidth they consume--are a huge headache for storage administrators. But users say that with proper engineering, a little shuffling of resources and some capital expenditure, deduplication promises big savings on time, space and money. "The alternative is to keep fighting with the old model," says Bernie Robichau, network administrator and security officer at the South Carolina Department of Parks, Recreation and Tourism, which is a Data Domain user.