News Stay informed about the latest enterprise technology news and product updates.

Data deduplication: no lifeguard on duty?

In the course of a conversation today with a new SRM vendor, ArxScan, CEO Mark Fitzsimmons mentioned a use case for the startup’s product that had me raising my eyebrows: basically, keeping data deduplication systems honest.

According to Fitzsimmons, a large pharma company wanted the Arxscan product to migrate data identified as redundant by the data deduplication system to another repository and present it for review through a centralized GUI, so that the customer could sign off on what data was to be deleted.

“So you’re replacing an automated process in the data center with a manual one?” was the confused reaction from one of my editors on the conference call.

“Well, we’re working on automating it,” was the answer. “But the customer found dedupe applications weren’t working so well, and wanted a chance to look at the data before it’s deleted.”

I’ve heard of some paranoia at the high end of the market about data deduplication systems, particularly when it comes to virtual tape libraries or large companies in sensitive industries like, well, pharmaceuticals. One question I’ve heard brought up more than once by high-end users is about backing up the deduplication index on tape, the better to be able to recover data from disk drives should the deduplicating array fail. But breaking apart the process for better supervision? That’s a new one for me.

Anyone else heard of anything like this? Or is the customer going overboard?

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

In my opinion the customer is not going overboard, I've had similar concerns. I've posed questions to de-dupe resellers about the validity of the data that has been de-duped and would it be able to stand up in court. I haven't gotten an answer that I understand, or better stated the answers that I've gotten haven't been: "Yes" or "No". In industries where regulatory agencies can simply shut your doors, and doing so costs you billions in R&D, if they decide someone has tampered with test results or data collection methods I can completely understand why Pharma would be paranoid. I don't know that breaking up the process will help me sleep better at night, but in the scheme of things is the extra savings in disk worth the risks that block level data manipulation poses? I may sound like a naysayer, but I actually like de-dupe, I think it's a great idea who's time for the spotlight has come, and am a proponent but not the way it seems it's being pitched as a panacea for storage growth. Not every technology is meant for every application and sometimes saving money isn't the primary item on the project request form.