News Stay informed about the latest enterprise technology news and product updates.

E-discovery firm pushes out tiered storage with primary storage data reduction

E-discovery data processing firm LDiscovery moves away from tiered storage by compressing more data onto its primary BlueArc NAS array with Ocarina deduplication.

An e-discovery processing firm is moving away from tiered storage and compressing more data onto its primary clustered NAS arrays using a primary storage data reduction appliance from Ocarina Networks.

LDiscovery Legal and Technology Consulting, headquartered in McLean, Va., and with offices in Philadelphia, Chicago, New York and London, collects and culls data for clients prior to legal review by attorneys. In its five-year history, the company has amassed more than 250 TB of data on BlueArc Corp. Mercury 100 and Titan 2100 primary storage arrays, and more than 1 PB of data on Aberdeen LLC Stirling X888 storage servers filled with 2 TB SATA disks and archival tape, according to Brian Wolfinger, vice president of electronic discovery and forensic services. LDiscovery has a total of 40 employees.

The Aberdeen servers offer "a great price per TB for large format storage boxes," Wolfinger said, "But 2 TB SATA disks are ponderous for access speeds compared to BlueArc."

More on primary data reduction
Neuxpower launches NXPowerLite for File Servers for primary storage data reduction

EMC previews FAST 2, block-level compression, common management console for Clariion and Celerra

StorSimple integrates primary data deduplication, automated tiers with cloud storage

Storwize adds HA, Compression Accelerator to primary storage data reduction device
But data deletion isn't an option at this point. "We have huge retention times because the litigation cycle can take so long — we have data we performed collection on in the first month of the company's existence in May 2005 that we're still handling and occasionally being called to testify about," he said.

This unpredictable access pattern made the firm begin to rethink the tiered architecture. Clients requesting terabytes of data rolled off to tape three years ago would have to contend with a time lag -- translating into billable hours -- to retrieve their data.

Wolfinger said he was skeptical when he heard about Ocarina Networks via his BlueArc sales rep. "If I could get a 50% reduction in data on data older than 30 days and it wouldn't be impactful to processing throughput, it would be a win for me," he said. Testing and deployment in production this year has yielded a 70% compression ratio, meaning for every 100 TB of logical data, only approximately 30 TB is physically stored.

The BlueArc arrays preserve file links in the original BlueArc file system and compress the data through an Ocarina appliance while moving it to lower tiers of storage. Wolfinger said the Ocarina/BlueArc Mercury system installed in the Virginia office sends compressed data more than 30 days old to a separate partition of SAS disks, which he finds the ideal blend between better performance than SATA and less cost than Fibre Channel (FC).

"Our oldest BlueArc file system -- just one 100 terabyte file system, [which was] pretty much packed solid when we first deployed Ocarina -- has seen a 70 TB savings," Wolfinger said. "That probably translates into about a $200,000 investment or more in additional spindles had we needed to add 70 TB of available space to another file system to make space available for ongoing work."

As a computer forensics expert, Wolfinger brushed off fears that compression and deduplication constitutes an alteration of data that may make it inadmissible in court. "I think it's kind of a manufactured 'Ooh, but what if…' kind of scenario," he said. "We consider those things but write them off when we're talking about post-processing data that's being migrated.

"I wouldn't recommend using Ocarina to dehydrate data that doesn't exist in some other form — then it's a one-way function with nothing to validate against, putting all eggs in the Ocarina basket," Wolfinger continued. "That's just bad methodology from a chain-of-custody perspective, but I don't think any reputable company in our arena would do that on live data they had no other means of getting back to."

NetApp Inc. deduplicates primary data in the file system, and Ocarina and Permabit Technology Corp. have developed embedded deduplication software for OEMs that will come to market later this year. These products handle data reduction natively in the file system instead of requiring a separate appliance.

Wolfinger said he wouldn't consider deploying dedupe natively, however, partly because the maintenance of stub files on the BlueArc array is key in his environment. Tighter integration into the file system might require the Ocarina Reader software client to decompress data on its way back out of the system.

"Our end processors have a very detail-oriented, focused job — I don't want them to have to think about another piece of software to have to interact with," Wolfinger said. Putting Ocarina inline as part of the primary file system is also not an option in his business. "I don't want any process that I can't robustly validate monkeying with my data until I've done what I need to with it," he added.

Dig Deeper on Storage optimization

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.