Clustered storage systems may finally come of age now that major vendors are putting more muscle behind their new offerings and, in the process, providing even the most conservative of IT organizations with a greater comfort level to test the waters.
Corporate IT managers are often skittish about trusting their data in the hands of less-established companies and, for the most part, a raft of startups were responsible for jump-starting the high-scaling clustered storage market.
But the vendor landscape is undergoing a dramatic transformation.
Hewlett-Packard (HP) Co.'s StorageWorks 9100 Extreme Data Storage System (ExDS9100) is the latest example. The ExDS9100 appliance, released in November, incorporates the clustered file system HP acquired through its 2007 purchase of PolyServe Inc. and focuses on high-capacity, low-cost, bulk storage.
Another major player, IBM Corp., moved more quickly following its acquisition of Tel Aviv, Israel-based XIV Ltd. a year ago. In August, IBM launched the block-based XIV Storage System, which boosted the capacity and performance of the prior XIV release, known as Nextra. XIV's competition includes products from 3PAR and yet another pair of acquisitions: Dell Inc.'s EqualLogic (January) and HP's LeftHand Networks Inc. (November).
Storage giant NetApp also sells a file-level clustered storage appliance, but it takes a different technology approach and typically competes in different circles. NetApp's Data Ontap GX System gets key clustering functionality through its operating system, rather than a clustered file system, and the feature-rich product is designed to scale both capacity and performance. The company's 2004 acquisition of Spinnaker Networks Inc. led to the maiden GX release, although it took more than two years to produce.
Roger Wilson, technical services manager at Mercy Medical Center in Des Moines, Iowa, likely wouldn't have considered buying a storage system from a small vendor because the data he supports, in some cases, can literally mean the life or death of a patient. But last year, with IBM standing behind XIV, Mercy bought two XIVs as it outgrew the pair of mirrored IBM DS6800s that store its digital images.
"I want to know that the vendor I buy my equipment from is going to be there for the long term and appreciate my world," says Wilson. "I know IBM is going to be in business next month, next year, five years from now."
Mercy Medical Center settled on the new XIV Storage System after becoming convinced the clustered system would offer the same speed and robustness at a lower cost. The product's purported ability to rebuild a 1 TB disk drive in 30 minutes or less didn't hurt either.
Clustered storage systems initially gained traction in high-performance computing scenarios that needed to scale performance and capacity. Users included research labs, universities, electronic and mechanical design firms, and the oil and gas industry.
Web 2.0 and other data-intensive applications are driving new interest, from consumer sites that store photos, video and member profiles, to telecommunications providers offering rich services, and broadcast and new media companies that create, distribute and store digital content.
"Now that we're seeing the bigger [storage] vendors get into the mix, we're likely to see a much bigger shift to clustered storage because of the cost benefits and the ease of operation," predicts Andrew Reichman, a senior analyst at Cambridge, Mass.-based Forrester Research Inc.
But clustered storage isn't the optimal solution to every problem. Terri McClure, an analyst at Milford, Mass.-based Enterprise Strategy Group, notes the overhead associated with intra-cluster communication. Many clustered systems are better at large sequential operations than I/O-intensive ones, although some perform well and others are being improved, she says.
Like NetApp's GX, HP's ExDS9100 is a file-based storage appliance, but the similarities are thin from there.
The HP system, for instance, features a true clustered file system to enable the servers to connect to the storage disks. The blades and storage disks also communicate internally via SAS. Yet another point of distinction is the fully symmetric architecture. Every node has access to all data at all times, according to Michael Callahan, chief technologist for enterprise NAS within HP's StorageWorks group.
"We wanted to address a broad collection of different kinds of workloads," says Callahan. "That's significantly more complicated from an engineering point of view, but it's required if you want to be able to support, with good performance, workloads that involved a lot of meta data operations, like creating lots of files, renaming them [and] updating them."
HP's ExDS9100 is optimized for high-capacity environments that need hundreds of terabytes of space delivered at low cost. Data-intensive applications from consumer websites, telecommunications companies and digital content distributors are prime targets.
The ExDS9100, available in two rack units, can scale to 16 blades, or nodes, with up to 820 TB of raw space and 640 TB usable. The minimum configuration is four blades and three storage blocks holding 246 TB raw and 192 TB usable.
"Traditionally, HP has been very strong in block but not so strong in file, and we're making a lot of investments to be much more of an aggressive competitor in this space," says Callahan.
The standard ExDS9100 configuration supports NFS and HTTP by default and can be customized to support CIFS, FTP and other protocols. Operating system support is limited to Linux.
The ExDS9100 typically competes against smaller vendors, such as Isilon Systems Inc. and the software-only Ibrix Inc., and even on occasion, EMC Corp.'s Atmos cloud storage system, according to Callahan. But it rarely runs into NetApp's GX, although Callahan says that could change in the future.
In the meantime, Callahan is amused to see areas of specialization developing among the file-level clustered storage systems. "It's gone from something that nobody used to something that's actually used in different ways for a lot of different things, with different systems that have different characteristics."
IBM's XIV Storage System
IBM claims its main competition in block-level clustered storage comes from traditional non-clustered, high-end storage systems from major players such as EMC and Hitachi Data Systems.
"We're really targeting the tier 1 storage system," says Orli Gan, a senior product manager for XIV. "What we're bringing to this market are all the advantages of having clustered storage -- the ability to scale, the ability to get the right type of performance, and the reliability that you gain out of this architecture."
XIV distinguishes itself from two of the block-level clustered storage vendors -- Dell's EqualLogic and HP's LeftHand Networks --- with its support for both Fibre Channel and iSCSI. Another competitor, 3PAR, does this as well.
Within the XIV architecture, data enters the system through one of the Fibre Channel or iSCSI ports and then goes to an interface module that determines, based on internal mapping, where the primary and secondary copies of the data need to reside. The data is then sent via Gigabit Ethernet switches to two different storage modules.
The current product scales from six nodes to 15 nodes, but a future edition will allow customers to grow to whatever scale they need, says Gan. Each module, or node, is a standard Linux server with 12 SATA disk drives and its own CPU, memory and cache unit, she says.
One knock is XIV's raw vs. usable capacity, which is currently 180 TB vs. 79 TB. But one early user, Burzin Engineer, vice president of infrastructure technology at Los Angeles-based Shopzilla Inc., isn't troubled. "They don't make you pay for raw capacity," he notes. A more bothersome limitation, he says, is that product upgrades can't be done while the system is running.
Greg Schulz, founder and senior analyst at StorageIO Group in Stillwater, Minn., assesses XIV as follows: "A big vendor with a system that has low capacity per footprint, performance yet to be determined, but an interesting architecture.
"As a product in the market today," he adds, "it's a solution looking for a problem. But it's an interesting work in progress."
NetApp Data Ontap GX
Although the use of clustered storage systems remains low in comparison to traditional scale-up systems, the uptake spiked rapidly during the last year, according to Bharat Badrinath, director of solutions marketing for engineering applications and file services at NetApp.
NetApp bundles software, servers and disks into an appliance to make it easier for customers to buy, install and manage its GX System. Standard software features include FlexVol for creating and managing virtual volumes, double-parity RAID protection and snapshots.
The company shipped a maintenance update for GX in December, following a feature set upgrade last June. But a more significant change is due this year, when NetApp expects to complete the long-pledged convergence of its Data Ontap GX clustered operating system and its Data Ontap 7G base operating system.
Under the current scenario, software running on one operating system works differently than it does on the other; as a result, procedures such as management, backup, mirroring and expansion are different, according to Jeff Tabor, product manager for GX.
Tabor says customers will benefit from a single storage architecture for all of their applications in the form of reduced management costs and added flexibility to repurpose a storage system initially bought for one application, such as Microsoft Exchange Server, to work with another application, such as a large media archive suited to clustered storage.
NetApp's clustering approach relies on its GX operating system to add capability on top of its enhanced Write Anywhere File Layout (WAFL) file system to provide a global namespace and additional clustering functionality, according to Tabor.
"WAFL alone is not a clustered file system," he says.
Still, Tabor claims WAFL, one of NetApp's "crown jewels," is responsible for the product's strong performance with sequential read and write, and random I/O operations.
The scale-out GX architecture creates a cluster of multiple nodes, or controllers, and through the global namespace, allows an administrator to view the storage behind it as a single pool. Performance and capacity scale as additional nodes and disk drives are added.
GX currently scales to 24 nodes, and the controllers support a wide range of disk types, including serial ATA (SATA), Fibre Channel (FC) ATA and, in the future, SAS. Plans also call for the 24-node limit to be expanded in the future, according to Tabor.