Better start thinking about your data growth in deadly terms.
Many of the problems we face in our attempt to manage a data center are a direct result of data growth. Data growth is constant, and it sometimes seems intent on destroying everything in its path. Unaddressed data growth will wreak havoc on your file system, disk, system, network, protection plans, processes and life. If you're like a lot of people, you might try to stay ahead of this never-ending cycle of growth by buying more of whatever is going to break next.
I think it's time we address the cause and not the symptoms. There's new data generated all the time, but most of it is generated by our own processes. We have data sprawl, replicas, copies of copies, backup copies of copies, and backups of replicas of copies of copies. We don't have a capacity problem, we have a science problem.
There's a process in biology called mitosis in which one cell divides to produce two genetically identical cells. Left unchecked in the right environment, those cells will split again and again. Soon, the petri dish that stored a microscopic quantity of stuff is overflowing all over the table. If a scientist acted like an IT guy, they would address this issue by pouring (migrating) the contents of the petri dish into bigger and bigger containers before they overflowed.
Originally, this science made sense. Scientists needed a bunch of exact replicas of a single
We know that Data Domain proved empirically that killing replicate data in the backup process is a very good thing. There are now a thousand dedupe stories to be told and they all share one theme: Killing data when it's no longer useful is a good thing.
So if killing off replicas at the end of the data lifecycle is good, killing them sooner would be even better. That's the next frontier. If you get rid of replicas as soon as they're no longer valuable (and before they have a chance to cause problems), you eliminate problems associated with biological replication. Killing, compacting, deduplicating, eliminating or compressing replicate data as close to the point of conception as feasible will yield the greatest possible benefits downstream. It's only logical.
How will you do this? First, you'll have to address process and strategy requirements; i.e., actually know how many copies you need and for how long, as well as have an actual plan on how to deal with them. Second, you'll have to leverage technology that can wipe out copies before they take over. These multiple copies are like the cockroaches of IT. Eventually cockroaches win and you have to move out.
Dedupe in the backup target market has created more than $2 billion in value (and growing), so imagine what value will be generated by moving that function closer to the point of creation for all of the different data types we generate. We'd be green (less data is as green as it gets), rich (we wouldn't need to buy anything new for a while), calm (less things to manage equals less things to break) and might actually be able to take a few minutes to think about how we can add strategic value to our organization, as opposed to running around in a hazmat suit all day dumping out petri dishes.
This was first published in January 2009