Until recently, schemes to improve storage efficiency were not applied to primary storage. Primary storage was...
considered sacrosanct. No one wanted to mess with something so important and besides, there was the much larger target of swollen backup files in secondary storage.
However, times have changed. Better and more efficient management of secondary data (often now deduplicated) has put primary storage in the crosshairs for several vendors and their customers. While the methods vary considerably, the goals are still much the same -- reduce the size of primary storage as much as possible through compression and deduplication techniques. This will not only reduce the costs of primary storage, but it will also help to proportionally reduce downstream needs for backup space.
How to best accomplish primary storage optimization
According to Larry Freeman, senior marketing manager for storage efficiency at NetApp, before embarking on new technology to enhance primary storage efficiency, you should make sure you are applying thin provisioning as much as possible. This technology has already demonstrated its ability to dramatically reduce overall primary storage requirements.
What's more, it does not require any special manipulation of the primary data itself. That said, however, Freeman also sees a place for data deduplication in primary storage, particularly in virtualized environments where there can be unnecessary multiples of files present for each virtual machine. Freeman says, "Even though the resource loads may be light in high-performance applications, deduplication is an extra burden that can hurt performance."
Data that is refreshed frequently will not benefit significantly from data deduplication strategies. "We do see opportunities in unstructured data that is typically stored inefficiently," Freeman says. "And we expect to see more deduplication in Sharepoint and Exchange server applications and even lightly used databases."
In keeping with the style of deduplication favored by NetApp, Freeman stresses that data deduplication can be accomplished best, with the least impact on performance, when it is done in the background rather than inline. "You can take advantage of periods of low activity like nights and weekends," he says.
Freeman also recommends proceeding slowly. "Some organizations try to apply this right away to every volume and then they wonder why the system is slow," he says. "We recommend starting slowly, then watch, look and learn." It also makes sense, he notes, to start with volumes you believe have lots of duplicate data and where performance isn't as much of a concern.
A place for inline and post-processing primary storage
John Matze, vice president of business development at Hifn, takes a different view. When it comes to primary storage, he says, there is a place for both inline and post-processing approaches. Matze recommends understanding your data and how it is used. That will help you select the best optimization methods.
Post-processing can cause problems, such as impacting backup windows. With post-processing, you must depend on the operating system to provide a capable caching environment. "It takes time to do post processing," says Matze, "but if you have a good caching environment you won't feel it as much." Post-processing also leaves you dependent on "pointers" to reconstruct the data, which can be endangered if there is a system crash.
On the other hand, says Matze, Inline optimization "gets your data clean" from the start. This can have benefits in storing and handling data thereafter.
Finally, a storage administrator should have realistic expectations. "Deduplicating backup data can produce tremendous efficiencies," he says. "But there are, in fact, fewer instances of duplicated data in primary storage compared with secondary storage."
How to optimize primary storage
Peter Smails, vice president of worldwide marketing at Storwize, lists four requirements for successful primary storage optimization.
- If it is to be worth your effort, you must be able to provide a high average data reduction.
- You must be able to minimize your impact on performance as much as possible. Ideally you would actually create a performance benefit but at a minimum you should be transparent to the users in terms of performance.
- You shouldn't require behavior changes on the part of users. The performance enhancements should happen without requiring extra actions.
- When you are dealing with primary storage, you should aim to run business-critical applications without impacting their availability, regardless of what you are doing with compressing or deduping data.
Primary storage and compression
One of the less recognized facts of primary storage optimization is that "many of the file types that are driving growth are already compressed," says Carter George, vice president of products at Ocarina Networks. As examples, he cites most of the document types available through Microsoft Office 2007, as well as Adobe PDF files. "When you try to further compress files like these," he notes, "you usually end up making a file that is larger."
Furthermore, the native compression scheme means that data deduplication efforts may not be able to spot files that are essentially the same. "When Microsoft compresses these files," George says, "the output is randomized so you can't recognize when two files are 99% identical."
Users also need to think about risk. "Enterprise customers really don't like to think about a situation where the original document never even existed -- as can happen with inline compression/data deduplication," George says. " No code is completely bug free, so when your only copy of a file is compressed from the start, you could end up writing garbage to disk."
Some compliance requirements mandate that for certain kinds of documents you must be able to show that archival activities didn't change any aspect of the document. According to George, "That's a high bar, but you must be able to meet it."
On the other hand, out-of-band solutions offer the opportunity to back up or snapshot and then shrink the file later.
In short, primary storage optimization seems to offer significant potential for enhanced efficiencies. But how you optimize your primary storage depends on competing technology visions. For now, the best advice may be caveat emptor -- let the buyer beware.
About the author: Alan R. Earls is a Boston-area writer focusing on the intersection of technology and business.