It's time to come clean. You've been handling a lot of garbage lately--in fact, you've been increasing the amount of garbage in the environment. And it's an issue that strikes close to home because it's your storage environment.
Every company is coping with almost out-of-control data growth, which puts a strain on primary storage resources, but it's usually most profoundly felt in the backup process. Whether you're using disk in your backups or not (and you should be), it's likely taking longer and longer to back up your company's data. And the longer it takes, the more likely it becomes that there's little time to do any restoration testing. If you're not confidant you can restore, your company's data may not be as protected as it needs to be.
Along with those contracts, spreadsheets, research reports and so on, there's a vast amount of crud building up in your primary storage systems. This detritus includes the usual suspects: old business files that haven't been useful since Jimmy Carter was president, test data from that 1998 database conversion project, as well as personal files like the MyResume.doc file that's in every user share, photos of last year's Dollywood vacation, Kanye West's latest MP3 download and YouTube videos of assorted fraternity pranks. No offense to Dolly and Kanye, but that stuff is junk and it's clogging up your system and making backup a lot harder--and riskier--than it should be.
Compression and deduplication applications do a great job of squeezing the air out of your data and ensuring that you back up a file only once. But as long as the garbage is still mixed in with the good, the best you'll end up with is skinnier, more unique garbage.
The diagnosis is simple: You have to get the garbage out of your system and your backup process. But the remedies aren't so simple.
You could start charging your business groups, with fees based on how much data they back up. That might be a deterrent and get the business units to police themselves, but if they're flush, they'll probably just shell out the extra bucks and your backup problem will be as big as ever.
The key, then, is to identify the garbage. Data classification products can help, but there's still a fair amount of manual work involved in using these tools effectively. Typically, end users will have to participate, which can heighten awareness of the garbage problem. Some products actually require end users to provide some data classification information each time they create, download or copy a new file. Users will be less apt to stick a YouTube video in their share if they're forced to enter meta data about the file, which will also make it patently clear that they own that file. Coupled with some corporate rules about the proper use of IT resources, that setup could become an even more effective deterrent.
There are other steps you can take. You can use your server operating systems or management software to restrict the types of files users can drop into individual volumes or directories. That way, you don't necessarily have to ban all types of files or force users to identify them individually, but you'll make users take an extra step when they want to save something they shouldn't, which may change their minds. Even if that doesn't dissuade them from saving the junk, you'll have a good idea where to look when the garbage begins to accumulate again. You can even let users know that those volumes won't be backed up regularly (or even at all).
And there's always the "all MP3 files will be deleted on Friday" approach. But a solid company- wide data destruction policy, built in concert with legal and business units, may be the safest way to prune data stores.