Nick Eglevsky, litigation support manager at New York City-based Kelley Drye & Warren LLP, said his firm has been working with Digital Reef Inc.'s Virtual Governance Warehouse data classification software, delivered as a service, for approximately a year to quickly classify and analyze data.
Digital Reef claims its data classification and search algorithms are more advanced than previous generations of data classification products. The product contains a "similarity engine" that can identify overall contextual similarity of files in a repository rather than simply matching keywords. Digital Reef came out of stealth last May with the intention of marketing its product for unstructured data management. The vendor has since added litigation support and e-discovery features, following a path similar to predecessors such as Kazeon Systems, now a part of EMC Corp.
Eglevsky said it was performance that first attracted his firm to Digital Reef's service. The data sets Kelley Drye works with are not that big -- "to us, a large data set is anything over 40 gigabytes," Eglevsky said -- but it needs to be analyzed, classified, and culled within two or three business days and sometimes as soon as overnight.
The goal for Kelley Drye is to eliminate irrelevant data before the data set is sent to an attorney for formal legal review because culling irrelevant data through an attorney's billable hours is expensive for both the firm and its clients.
"We don't want the review team going through documents that aren't relevant," Eglevsky said. "So we're not paying attorneys to review fantasy football and news alert emails." These considerations are growing more critical as corporate data everywhere increases in size, and e-discovery gains prominence as a standard IT practice, particularly in the regulatory climate following last year's economic recession.
Digital Reef has many established competitors — among the best-known are Clearwell Systems Inc., Driven Inc. and Merrill Corp. But Eglevsky said he found in his experience with other service providers that processing time for data sets in the hundreds of gigabytes usually takes at least a week and the results may still involve irrelevant data. Digital Reef's document clustering, which groups similar documents together to make relevancy decisions, helps cut down on irrelevant data making it to the review stage, Eglevsky said, and the processing time is usually a matter of days at the most.
Rather than deploying the software onsite, Kelley Drye sends its information either through a secure SSL connection over the Web or on an encrypted USB drive to Digital Reef's data center, where it is processed and the results are sent back to Kelley Drye's offices for review.
"We prefer the SaaS model — we considered bringing the software in-house, but we don't get enough really large data sets to warrant that expense," said Eglevsky. "If we had terabytes of data it might make sense to bring it on site."
Eglevsky said as long as Digital Reef's technology works, he doesn't have qualms about working with a relative newcomer to the industry. Version 3 of Digital Reef's software added a more streamlined GUI and new data management features such as the ability to list emails by domain senders. "So you can get all emails from Amazon.com and bulk-tag them as not relevant," he said.
Conceptual search is also a draw for Digital Reef. "If we have a series of documents relevant to a specific issue, we can upload them and say, 'Go find things conceptually similar to what I'm supplying,'" Eglevsky said.
Eglevsky said he would like Digital Reef to beef-up its keyword search in combination with the document-grouping concept search. "Right now, the most common keywords show up for like documents, but I'd also like to see documents that are similar but don't have the same exact keyword in them," he said.
David Butler, Digital Reef's vice president of marketing, responded in an emailed statement to SearchStorage.com: "We do support conceptual search through our integrated clustering and searching capability today. The next step is to support conceptual topics [that] will suggest topics to add into keyword searches/queries. This is on our roadmap for later this year."