Clustered storage promises better performance, scalability and reliability, but it's not designed to fit the needs of every storage environment.|
Clustered storage combines multiple arrays or controllers to increase their performance, capacity or reliability. In many cases, it's a cost-effective way to meet today's storage needs. But clustering isn't right for everyone.
Before choosing whether or how to adopt clustered storage, storage managers should understand their business and data access requirements. This includes asking themselves the following questions:
Clustering has been hitting the news headlines in the last year. For example, EMC Corp. now supports cluster storage for archiving and backup; Hewlett-Packard (HP) Co. bought PolyServe and its clustered file server; IBM Corp. recently purchased XIV Ltd., a privately held storage technology company based in Tel Aviv, Israel; and Sun Microsystems Inc. acquired the Lustre file system.
- What requires the best performance: random or sequential I/O?
- Which is more important: reliability or speed?
- What storage protocols and topologies must be supported?
- How quickly and to what point in time is recovery required after a disaster or hardware failure?
While definitions vary, clustering generally refers to an architecture in which multiple resources (such as servers or storage arrays) work together to increase reliability, scalability, performance and capacity. Technically, clustering can be done at the level of the disk drive as with RAID, in which multiple disk drives increase the scalability and reliability of the array. But the more common definition of clustering has it being done at the file server or file-system level (see "Cluster vs. Grid vs. Global namespace," below).
Cluster vs. Grid vs. Global NameSpace
Even within the storage community, it's often hard to nail down the differences among clusters, grids, and other concepts such as global namespaces and storage virtualization.
At the Los Alamos National Laboratory in New Mexico, Gary Grider, deputy division leader of the laboratory's HPC Systems Integration Group, refers to his complex of tens of thousands of processors as a cluster rather than a grid. In the high-performance computing (HPC) world, he says, a grid usually implies processors linked by a WAN.
Some industry observers say a grid implies higher numbers of commodity components that work together in a tighter linkage than in clusters. Others compare both concepts to a global namespace, which "is software that runs on different servers and allows them to run a shared directory," says Greg Schulz, founder and senior analyst at StorageIO Group, Stillwater, MN. "A global namespace is a virtualization view without physically aggregating or consolidating any of the underlying file systems or storage systems," he adds.