How is an enterprise to choose a clustering technique, if any? This series of articles compares three methods of storage clustering, including examples of where each particular method is effectively applied. This installment
In contrast to the two methods -- paired-failover pseudo-clustering and non-distributed, tightly coupled clustering -- a distributed clustering architecture provides "n" independent controllers in a "loosely-coupled" communication system. In its most general sense, "loosely-coupled" is an attribute of such systems, referring to an approach to designing interfaces across modules within the system to reduce the interdependencies across modules or components, reducing the risk that changes within one module will create unanticipated changes within other modules.
Distributed clustering delivers distinct architectural advantages over either paired-failover or non-distributed clustering techniques. A modular design enables flexible investment, deployment, and management of storage resources, while providing the only single storage architecture to seamlessly scale from edge-to-core. Within the distributed cluster, dimensional virtualization -- across capacity, performance, and location -- and intelligent control eliminate the complexity and static nature of either paired-failover or non-distributed clustering by abstracting physical complexity into a highly intuitive and dynamic management environment. For example, since physical resources are virtualized in a distributed cluster, administrators manage LUNs associated with servers or applications instead of specific physical arrays, RAID groups, and drives. Distributed clustering is inherently designed for maximum data integrity and minimal recovery time in the case of site failures.
Distributed clustering specifically seeks to increase flexibility in adding modules, replacing modules, and changing operations within individual modules, without disruption to the processes that are active within the cluster. For example, a distributed clustering storage system is characterized by the fact that each of the active components (in this case, storage controllers) is an independently operable entity. Said differently, each storage controller within a distributed cluster provides storage volume (LUN) access without requiring that another paired or partner controller exists.
Each controller performs its own independent storage function, but also incorporates communication channels to exchange messages with any other controllers that exist in the distributed cluster. Thus, greater aggregate throughput (than is possible in other techniques) of data between servers and storage is achievable, due to parallel, independent access to a shared storage pool. Distributed clustering storage architectures allow for n-way implementation with its inherent, extremely high resiliency that is achieved by simply connecting additional controllers into the storage cluster.
Best fit for distributed clustering
The best fit for distributed clustering is an environment consisting of tens or hundreds of servers that require dynamic yet simple access to very highly available storage volumes. This environment is characterized by its dynamic nature -- applications, servers, and operating systems that change rapidly and/or unpredictably over time -- as well as its constantly changing requirement for storage capacity and performance. As such, this represents a large majority of the business and industry environments found today, as well as being planned for the future (e.g., blade servers and clustered applications). The key to this architecture is its dynamic simplicity and ease of provisioning, as opposed to the two other methods, which require a priori decision-making and difficult reprovisioning.
Distributed clustering is also the best architectural fit for optimal "stretch" clustering, the method of separating entire groups of cluster components -- servers, networking, and storage -- over distance.Summary
These techniques represent the current "state of the art" in storage architecture. As such, given its range of applications and business requirements, an enterprise must determine the "best fit" to minimize cost and complexity, while maximizing business value and efficiency. Clearly, distributed clustering offers the optimum levels of resiliency, responsiveness, and scalability in the widest range of environments. While specific fits for paired-failover and non-distributed clustering exist, these fits are inherently limited both from an architectural point-of-view as well as -- most importantly -- the view of the business, trying to achieve maximum efficiency and value for its time and money.
About the author
Robert Peglar is the Chief Architect for XIOtech Corporation. He is responsible for storage architecture, healthcare technology and strategic direction. Robert is XIOtech's principal member of the SNIA, the IP Storage Forum and the Shared Solutions Forum.
This was first published in June 2004