For scalability, Digital Reef has broken its software into three layers. Each layer can run on clusters of hosts, according to Brian Giuffrida, Digital Reef's vice president of marketing and business development. The access tier can be tunneled through a firewall if necessary, the service tier acts as a job router for requests that come into the system and the analytics tier processes data. Within the analytics tier, the software also checks large jobs to see if they need to be load-balanced or restarted.
"It scales to the needs of a true enterprise data store," Giuffrida said.
Giuffrida claims Digital Reef's data classification and search algorithms are more advanced than previous generations of classification products. The product contains a "similarity engine" that can identify overall contextual similarity of files in a repository rather than simply matching keywords.
"So an oil-exploration company investigating a failure mechanism on a particular offshore oil rig could find solutions to the problem written on other rigs where the components are named differently," Giuffrida said.
The similarity engine also allows the identification of duplicate or near-duplicate files without depending on specific keywords.
Giuffrida said the software could be
"We want to show customers what they don't know they don't know about their file repositories," he said.
The software only moves data if users or policies require it to do so. It doesn't perform automated migration, although it can initiate data migration processes for litigation holds if necessary. Giuffrida said Digital Reef plans to take this engine a step further in future releases this year, with automated, tiered storage features and classification of multimedia files on the roadmap.
Arun Taneja, founder and consulting analyst at Hopkinton, Mass.-based Taneja Group, said the technology looks promising on paper, but he has seen this movie before.
Taneja said scalability issues often come to light only after enterprise customers expose a new product like this to real-world environments.
Despite a flurry of products over the past few years, data classification software hasn't caught on with customers, and the bad economy will likely make it harder to break through. Another challenge is that data classification doesn't always fit neatly into one IT group at large enterprises.
"What I've heard again and again is that customers want this functionality, but they can't figure out what budget line to take it out of," Taneja said. "SRM [storage resource management] tools have had a similar problem."
Taneja said the convergence of file management—even file virtualization—with data classification could help Digital Reef if it adds automated storage migration features.
However, Tony Asaro, senior consultant and founder at INI Group LLC, who has been consulting with Digital Reef, said he advised the company against automatic migration. "Moving data and keeping track of those associations seems like a whole other business," Asaro said. "Look at the data mover market—who's left, F5 [Networks Inc.]? There's a lot more differentiation in enterprise search than data movement."
But, he added, "They're getting pressure from customers for it."