Arrays based on ATA drives are pretty cheap, but storing backups to them can eat up capacity at a furious rate...
Don't be surprised when you start seeing disk-based backup products that advertise capacities such as 23.4TB. That's the figure being promoted by Palo Alto, CA, startup Data Domain for its DD200 disk-based "recovery storage appliance."
Read the fine print, though, and you'll find that those 23.4TB are virtual. In actuality, the DD200 holds 16 disks in a standard 4U high package, for useable physical capacity of 1.2TB, for a list price of $58,000. According to Brian Biles, Data Domain co-founder and vice president of marketing, a single DD200 should be enough to hold up to five months of backups for a 1TB to 1.5TB backup data set.
How does Data Domain turn 1.2TB into 23.4TB? It uses a combination of standard tape compression, and a data reduction technique it calls Global Compression Redundancy Pooling. Together, these two types of compression can reduce capacity requirements by 20 times.
Another startup, Irvine, CA-based Avamar, uses a technique similar to Data Domain's Global Compression in its Axion disk-based backup solution. Called commonality filtering, it works with the help of a client-side agent that identifies sequences of data which are likely to get repeated. Sequences, rather than entire files, are sent over the network and stored in Axion's content-addressed storage (CAS) object store; subsequent instances of sequences are stored as pointers.
Over time, therefore, the Axion system learns more and more sequences, and can provide more and more efficiency, says CEO Kevin Daly--as much as 400:1 in some extreme instances. Data types that Axion does well with include typical office documents, but it finds fewer efficiencies in scientific and image data, Daly says.
Some storage managers may bristle at these backup techniques, preferring the relative simplicity of tape. But for Data Domain's Biles, daily consistency checks make those fears moot. "You're much more likely to be able to retrieve your data from a system like ours than you are from a tape."