Better start thinking about your data growth in deadly terms.
Many of the problems we face in our attempt to manage a data center are a direct result of data growth. Data growth is constant, and it sometimes seems intent on destroying everything in its path. Unaddressed data growth will wreak havoc on your file system, disk, system, network, protection plans, processes and life. If you're like a lot of people, you might try to stay ahead of this never-ending cycle of growth by buying more of whatever is going to break next.
I think it's time we address the cause and not the symptoms. There's new data generated all the time, but most of it is generated by our own processes. We have data sprawl, replicas, copies of copies, backup copies of copies, and backups of replicas of copies of copies. We don't have a capacity problem, we have a science problem.
There's a process in biology called mitosis in which one cell divides to produce two genetically identical cells. Left unchecked in the right environment, those cells will split again and again. Soon, the petri dish that stored a microscopic quantity of stuff is overflowing all over the table. If a scientist acted like an IT guy, they would address this issue by pouring (migrating) the contents of the petri dish into bigger and bigger containers before they overflowed.
Originally, this science made sense. Scientists needed a bunch of exact replicas of a single cell to perform different tests or experiments on them. In IT the same holds true; we need a bunch of replicas of data to run different applications against them. We use these replicas to run tests, populate data warehouses, create backup and disaster recovery copies, and to send copies to other users. The difference is that scientists know up front how many replicas they want/ need and plan for it. But IT processes seldom have the pre-planning that exists in science labs. And that, my friends, is a huge part of our problem. When scientists are done with their experiments, they get rid of the replicas. Our answer to the challenge is to buy a bigger petri dish from our sales rep.
We know that Data Domain proved empirically that killing replicate data in the backup process is a very good thing. There are now a thousand dedupe stories to be told and they all share one theme: Killing data when it's no longer useful is a good thing.
So if killing off replicas at the end of the data lifecycle is good, killing them sooner would be even better. That's the next frontier. If you get rid of replicas as soon as they're no longer valuable (and before they have a chance to cause problems), you eliminate problems associated with biological replication. Killing, compacting, deduplicating, eliminating or compressing replicate data as close to the point of conception as feasible will yield the greatest possible benefits downstream. It's only logical.
How will you do this? First, you'll have to address process and strategy requirements; i.e., actually know how many copies you need and for how long, as well as have an actual plan on how to deal with them. Second, you'll have to leverage technology that can wipe out copies before they take over. These multiple copies are like the cockroaches of IT. Eventually cockroaches win and you have to move out.
Dedupe in the backup target market has created more than $2 billion in value (and growing), so imagine what value will be generated by moving that function closer to the point of creation for all of the different data types we generate. We'd be green (less data is as green as it gets), rich (we wouldn't need to buy anything new for a while), calm (less things to manage equals less things to break) and might actually be able to take a few minutes to think about how we can add strategic value to our organization, as opposed to running around in a hazmat suit all day dumping out petri dishes.