On average, how many copies of data get made between production and test/dev? Probably more than you think, according to Mike Matchett.
"There's an idea that in a given organization there's up to 14 copies of the same data sets," explained Matchett in his presentation "Attack of the Killer Capacity" at TechTarget's Storage Decisions conference in New York. The Taneja Group analyst explained the different methods of copy data management, and what exactly is taking up so much space.
"You take a single file and you email it out to 300 of your closest associates, and there's 300 copies of that file out there sometimes," said Matchett, demonstrating how quickly extraneous copies of data can be made. Copy data management is a key component of reducing the number of copies that could be taking up vital space in your storage.
When speaking about copy data management, Matchett discussed Actifio, who first popularized the term. It was used in reference to their process of capturing production data, keeping a golden copy and spawning virtual copies when necessary.
"So there's only one copy, that effective copy on disk," explained Matchett. "And all of the downstream users of that copy are really getting pointers to the virtual copy." This incremental, continuous data protection of the primary copy is a valuable method of reducing the number of copies.
RAID and erasure coding were other copy data management methods discussed in the session. Erasure coding, said Matchett, is a bit like a combination of RAID and replication. "It's really a type of RAID where you make copies but instead of doing slices in a parity bit you actually have different equations." Erasure coding can also be helpful with distributed object storage, since it saves capacity and increases data protection.