Until recently, most grid storage systems have been deployed in specialized high-performance computing environments...
such as oil and gas exploration. But new grid storage systems specifically designed for low-cost, general-purpose data protection from Cleversafe--an open-source community that's creating software for dispersed data storage--and NEC are making their way out of research labs and into corporate beta-testing environments.
Computer scientists say the Internet will soon have the same impact on data storage that it's had on computing and communications technologies. Cleversafe, started by Chris Gladwin, its president and CEO, has developed a way to store documents and files over the Internet in slices of encrypted data that can be reassembled only by the computers that originally created the files. In the case of a node or nodes failing, any majority of nodes can put the data back together; for example, in an 11-node system, six nodes can recover all the data, says Gladwin.
Cleversafe consists of two parts: Cleversafe.org, an open-source project that's developing the code that enables the dispersed grid storage architecture; and Cleversafe LLC, which plans to commercialize the project by licensing grid node sites. Cleversafe operates a multiterabyte research grid users can tap into to explore the storage grid software. The research grid is for testing only, however, and data is flushed periodically.
NEC's Hydrastor storage platform is much closer to becoming a real product. Initially designed for second-tier storage (but with a roadmap to add primary and archival storage in a year or two), it consists of a grid of two types of storage nodes: Accelerator Nodes (ANs) to scale performance and Storage Nodes (SNs) to provide disk capacity. The system can theoretically scale to thousands of petabytes (PB) of capacity in 2.5TB increments. A 140TB system will cost approximately $100,000.
What NEC calls Distributed Resilient Data (DRD) technology protects against up to three disk or node failures, which is 300% more protection than RAID-5, says Karen Dutch, NEC's general manager for advanced storage products. Other data management services migrate or replicate data between nodes, and constantly changing files can be protected through continuous data protection. DataRedux, NEC's data-reduction software, eliminates repetitive data segments.
If the primary Hydrastor site fails, data can be continuously accessed through the nodes in another location without disrupting the application. When the primary site is brought back online, the data is automatically rebuilt and load balanced to that site from the other nodes, eliminating the need for complex failover and failback scripts.
Michael Thomas, storage architect at the Federal Reserve System, has been beta testing Hydrastor and hopes to move it into his production environment for secondary storage. "We have so much data distributed across multiple locations, backup is a significant undertaking ... we face tremendous pressure to complete backups within our backup window while confronting compounded rates of data growth. Yet we can't keep hiring more people to manage our increasingly complex environment," says Thomas. "A grid storage solution that is massively scalable, easy to manage and cost effective is just what we need."
Steve Duplessie, founder and senior analyst at Enterprise Storage Group, Milford, MA, agrees. "To ever think that the IT infrastructure can truly deliver on-demand, predictable services in any sort of reasonable real-time way, the physical storage limitations of the 'box' will have to cease to exist," says Duplessie. "Only when knob-twiddling specialists are no longer required, and scale in any dimension happens dynamically and autonomously, can we even start to talk about IT as a real business partner."