An e-discovery processing firm is moving away from tiered storage and compressing more data onto its primary clustered...
NAS arrays using a primary storage data reduction appliance from Ocarina Networks.
LDiscovery Legal and Technology Consulting, headquartered in McLean, Va., and with offices in Philadelphia, Chicago, New York and London, collects and culls data for clients prior to legal review by attorneys. In its five-year history, the company has amassed more than 250 TB of data on BlueArc Corp. Mercury 100 and Titan 2100 primary storage arrays, and more than 1 PB of data on Aberdeen LLC Stirling X888 storage servers filled with 2 TB SATA disks and archival tape, according to Brian Wolfinger, vice president of electronic discovery and forensic services. LDiscovery has a total of 40 employees.
The Aberdeen servers offer "a great price per TB for large format storage boxes," Wolfinger said, "But 2 TB SATA disks are ponderous for access speeds compared to BlueArc."
This unpredictable access pattern made the firm begin to rethink the tiered architecture. Clients requesting terabytes of data rolled off to tape three years ago would have to contend with a time lag -- translating into billable hours -- to retrieve their data.
Wolfinger said he was skeptical when he heard about Ocarina Networks via his BlueArc sales rep. "If I could get a 50% reduction in data on data older than 30 days and it wouldn't be impactful to processing throughput, it would be a win for me," he said. Testing and deployment in production this year has yielded a 70% compression ratio, meaning for every 100 TB of logical data, only approximately 30 TB is physically stored.
The BlueArc arrays preserve file links in the original BlueArc file system and compress the data through an Ocarina appliance while moving it to lower tiers of storage. Wolfinger said the Ocarina/BlueArc Mercury system installed in the Virginia office sends compressed data more than 30 days old to a separate partition of SAS disks, which he finds the ideal blend between better performance than SATA and less cost than Fibre Channel (FC).
"Our oldest BlueArc file system -- just one 100 terabyte file system, [which was] pretty much packed solid when we first deployed Ocarina -- has seen a 70 TB savings," Wolfinger said. "That probably translates into about a $200,000 investment or more in additional spindles had we needed to add 70 TB of available space to another file system to make space available for ongoing work."
As a computer forensics expert, Wolfinger brushed off fears that compression and deduplication constitutes an alteration of data that may make it inadmissible in court. "I think it's kind of a manufactured 'Ooh, but what if…' kind of scenario," he said. "We consider those things but write them off when we're talking about post-processing data that's being migrated.
"I wouldn't recommend using Ocarina to dehydrate data that doesn't exist in some other form — then it's a one-way function with nothing to validate against, putting all eggs in the Ocarina basket," Wolfinger continued. "That's just bad methodology from a chain-of-custody perspective, but I don't think any reputable company in our arena would do that on live data they had no other means of getting back to."
NetApp Inc. deduplicates primary data in the file system, and Ocarina and Permabit Technology Corp. have developed embedded deduplication software for OEMs that will come to market later this year. These products handle data reduction natively in the file system instead of requiring a separate appliance.
Wolfinger said he wouldn't consider deploying dedupe natively, however, partly because the maintenance of stub files on the BlueArc array is key in his environment. Tighter integration into the file system might require the Ocarina Reader software client to decompress data on its way back out of the system.
"Our end processors have a very detail-oriented, focused job — I don't want them to have to think about another piece of software to have to interact with," Wolfinger said. Putting Ocarina inline as part of the primary file system is also not an option in his business. "I don't want any process that I can't robustly validate monkeying with my data until I've done what I need to with it," he added.