BOSTON – Emerging vendors looking to get noticed by the storage industry at this week's Business Development Networking Event (BD Event) share a focus on data archiving for long-term retention and regulatory compliance.
E-discovery archivers Tarmin Technologies Ltd. and Nayatek LLC, database archiver Clearpace Software Ltd. and healthcare archiving specialist BridgeHead Software Ltd. focus on software-only delivery, and most on grid or scale-out design. Some of these vendors also share approaches such as including alternatives to SQL databases for indexing, supporting niche data types and integration with the cloud.
Venture capitalists speaking at a Monday panel at the BD Event said software-only delivery may become all but mandatory for storage companies looking to get funded going forward. Software-only systems can reduce capital expenditures as a startup ramps up product development. "We're not looking for people building a faster disk drive at this point," said Ash Ashutosh, a partner at venture capital (VC) firm Greylock Partners.
Three of the data archiving companies offer grid or scale-out architectures. Nayatek, which was founded in 2006 and completed a Series A funding round last December, has just come out of stealth mode with a beta product called Datosphere. CEO Marc Olson said Datosphere will use a software appliance to add indexing, compression and encryption to back-end storage devices. Nayatek uses a RAIN (redundant array of independent nodes) architecture to scale-out and provide redundancy. It archives email, Simple Mail Transfer Protocol (SMTP), instant messaging and unified communications with the goal of being a "data neutral" archive. The vendor is beta testing file archiving and plans to add SharePoint support, Olson said.
Tarmin Technologies, which is beta testing Version 1.5 of its GridBank product, uses a service-oriented architecture (SOA) to manage the multiple processes the grid performs, from policy management and security to cloud migrations. The services layer coordinates communication between the different modules. Eric Herzog, Tarmin's vice president of marketing and sales, said in addition to letting customers pick and choose what feature to deploy, SOA will keep the product reliable. "It makes it hard for any new module we add to cause a bug in other parts of the software," he said.
Clearpace offers the option of being deployed against a cluster file system for scale-out; it breaks up database values into segments it can store as files.
Avoiding SQL-based indexing for scalability
Nayatek and Tarmin Technologies both claim that the SQL database, used by many older data archiving products for metadata management and indexing, doesn't scale because weighing down the database with too much data can slow performance. Datosphere uses a flat file for metadata, which can be appended indefinitely. The file will also record which node each piece of data is stored on to keep searches fast. Tarmin uses a proprietary database.
Analysts say there's no set limit to the scalability of SQL. It depends on the environment, though such a limit does exist, according to Brian Babineau, a senior analyst at Milford, Mass.-based Enterprise Strategy Group (ESG). But hitting that ceiling isn't inevitable, he said. "There are plenty of archiving solutions successfully implemented in big companies with SQL behind them," he said.
Cloud integration becoming a must-have
Emerging data archiving vendors are looking to latch onto the cloud, as are established vendors. Tarmin revealed this week that GridBank can write to Amazon's Simple Storage Service (S3), with support for other services planned. Last week, Clearpace added a similar option to send data to a cloud target for offsite data storage.
BridgeHead CEO Tony Cotterill said his company will also integrate with the cloud in the next two months. But Cotterill said he doesn't expect much interest yet from the privacy-sensitive healthcare vertical.
"[Cloud] adoption is going to be slower," Cotterill said. "There are barriers to putting data out in what is essentially the public domain. We can complement it with encryption and digital signatures, but the cloud has work to do."
Added ESG's Babineau: "That hesitancy is a hump some companies are just going to have to get over. I don't see it being a long-term impediment."
Tackling healthcare, structured data
The new generation of data archiving providers is looking to fill in gaps for more specialized types of data, such as structured databases and medical images.
BridgeHead execs said they decided to take the leap into the healthcare vertical because developing specialized services like Digital Imaging and Communications in Medicine (DICOM) and Picture Archiving and Communications System (PACS) integration that medical providers require is a full-time job. There's also a big opportunity with billions of dollars pledged for digitizing healthcare records in the American Recovery and Reinvestment Act, aka the economic stimulus package.
Clearpace CEO John Bantleman said his startup is strictly focused on archiving structured data. Bantleman claimed his company's Database Management System Services (DBMS2) can perform between 20 times and 60 times compression of database data, and can read it back directly from the archive without needing to uncompress it.
Clearpace isn't alone in the database archiving market, said John Webster, principal IT advisor at Nashua, N.H.-based Illuminata Inc. He cites Solix Technologies Inc. and IBM's Princeton Softech products as competitors "who have been doing a version of this for a while." He added that the market has not taken off as he had once expected it would. "You'd think there'd be a bigger market opportunity than there is, but so far there hasn't been," he said. "For some reason, it just hasn't become hot."