Data deduplication is the poster child of 2008. Everyone is rushing to add this capability to just about everything that could possibly ever sit on a network–I thought I saw an ad for a cable tester with de-dupe built in! On the face of it, de-dupe looks like the savior it’s made out to be (except in very isolated instances where it actually inflates the size of stored data, but that’s another subject for another time.).
But take a look a little deeper with my paranoid, curmudgeon-y, semi-lawyer-esque hat on.
De-dupe technology has been likened to “zip” on the fly (no pun intended), which is where I have a couple of problems while wearing my pseudo-legal hat. The first is the act of compression. Way back in the olden days of computing there was a product appropriately named Stacker; its purpose in life was to allow you to fit more on the ridiculously expensive devices we had in our computer called “hard drives”. Microsoft, not content with Stac backing out of a licensing deal, created DoubleSpace (got sued and lost), then DriveSpace (DOS 6.21).
Via the use of a TSR (even the acronym is dusty), these products would intercept all calls destined for your hard drive and compress the data before it got there. Sound familiar? Those disk compression tools had their run, I used them but it presented problems with memory management, at the time Bill Gates decided no one would ever need more than 640KB, amongst other things. This presented a phenomenally large problem when I would load up one of my favorite games at the time from Spectrum Holobyte: Falcon 3.0, Falcon fans know what sorts of contortions one had to endure to get enough lower memory to run Falcon, but I digress.
So I would try to get around having Stacker or DoubleSpace turned on all the time. That didn’t work out well for me, and I spent quite a bit of time compressing and re-compressing my hard drive, enabling and disabling Stacker and DoubleSpace and setting up various non-compressed partitions.
While I don’t see that specific instance as an issue now per se, I do have that (bad) experience, and because of it I have a problem with something sitting inline with my data, compressing it with a proprietary algorithm that I can’t undo if/when the device decides it doesn’t like me anymore. Jumping back 16 years, it wasn’t that hard to format and reinstall DOS, which was a small part of my (then gigantic) 160MB ESDI hard drive, to get around the problems I had. But today when we are talking about multiple Terabytes and such, I want to be sure that I can get to my data unfettered when I need it.
The reason I am paranoid about getting access to my data when I need it: compliance and legal situations. Which brings me to my second point. How will de-dupe stand up in court? Is it even an issue? Is compression so well understood and accepted that it wouldn’t even be problem? Even as paranoid as I am I would have to say … maybe.
Compression has been around for a very long time, we are used to it, we accept it, and we accept some if its shortcomings (ever try to recover a corrupted zip file?) and its limitations, but will that stand up in court? In today’s digital world there are quite a few things that are being decided in our court systems that may not necessarily make sense. Are we sure our legislators understand the differences between a zip (lossless) and JPEG (lossy) compression? How does the act of compressing affect the validity of the data? Does it affect the metadata or envelope information? The answer to these questions, while second nature for us technology folks, may not so second nature for the people deciding court cases. Because compressing and decompressing data is a physical change to the data itself, I can imagine a lawyer trying to invalidate data based on that fact.
I hope that doesn’t turn out to be the case. The de-dupe products currently on the market have some astounding technology and performance. They also return quite a bit to the bottom line when used as prescribed, and the solid quantifiable return on investment they represent does for most outweigh any risks.