His answers can be read below or downloaded as an MP3.
You must have Adobe Flash Player 7 or above to view this content.See http://www.adobe.com/products/flashplayer to download now.
Download for later:
Reducing your data footprint
• Internet Explorer: Right Click > Save Target As
• Firefox: Right Click > Save Link As
>>Why should you be aware of your data footprint?
>>How can you determine your data footprint?
>>How can you reduce your data footprint?
>>How can you address the power consumption of your storage equipment?
>>How can you evaluate a data footprint reduction solution?
>>How do reduction technologies differ for online and offline storage?
>>How can information lifecycle management reduce your data footprint?
Your data footprint is the result of all the information that you have both online and offline. Not only is it your file servers, email systems and databases, but also the backup copies of that data that you have on tape, removable disk or in the cloud.
So your footprint is the copies and the result of all the data you have, not just what's online. What's referred to as the data footprint impact is the result of keeping this information for longer periods of time, and the impact of having to manage, secure and protect it.
Some of the techniques include tools for storage resource management (SRM), data classification and search. In other words, tools that give you insight into what information and resources you have, where that data is situated and located, and basically how much of your data is being used and where.
Then you need to get insight into how much of that information is changing: how much of it is static, where your backup copies are located, how effective your data protection tools are, what copies of data you have and what your retention policies are. Determining this really comes back to looking at what you currently have, how it's being used and its impact.
There's one simple technique, which is deleting all of your data. It's a basic premise, but it's important to have data management processes in place. In addition to simply managing your data, things like archiving are important. There's this notion that archiving is only important for regulatory compliance, but for decades it's been used in highly optimized environments for reducing costs and complexity.
So archiving, compression, data deduplication, thin provisioning, space-saving snapshots and different RAID levels are all techniques you can use to minimize the impact of an expanding data footprint.
It certainly comes back to archiving. By archiving, you reduce the amount of data that you have in your systems by moving that static data that you've discovered and classified off your primary expensive storage and onto technology that consumes less power. Dedupe also gets a lot of attention for its ability to focus on backup and achieve higher densities of the amount of data that's put on storage when you start to measure on a capacity per-watt basis.
There are also some other things that come into play. Certainly using intelligent power management techniques, such as second-generation massive array of idle disks (MAID), which power devices down when they're not in use. But you need to do so in a way that doesn't compromise the quality of service.
Take a step back and determine if you're looking for an online, offline or near-line solution. Is this for your database, email system or file system, or is this for online active archives or for backups?
Having that in mind determines what types of tools you'll be looking for. If you're looking to do this for backup, then you'll be looking at tools for deduplication, whether it's immediate or post-processing, source or target-based. If you're looking to reduce the impact on your online running databases, then you'll be looking at things like database archiving or compression, where within the database software, you can turn compression on.
Generally speaking there is, but where it's becoming blurred is where there are some tools that can work online, offline or in a near-line mode. There are some tools that will work online for files, but can't work for block data. The big differentiator is that with a lot of tools for reducing data footprints, you're trading time for space.
In other words, if you can take time to reduce the data, you can achieve space savings. When you're looking at online data, the primary premise for performance is time. So it's a balancing act -- you trade time for space. If you need time, that's performance. If you need space, that's capacity reduction. That's where the significant differences lie in the tools for working with primary databases vs. a file system.
It depends on how you view information lifecycle management. If you view ILM as a product -- as in archiving, shelving or hierarchal storage management (HSM) -- than that's a data footprint technique. But if you think of it as more of a technique for managing your information and aligning the right tiers instead of just simply deleting everything, then ILM is part of the overall umbrella of data management. If you think of it in terms of a specific product or technique, than it's just one of the tools in the tool box.
This was first published in November 2009