This article can also be found in the Premium Editorial Download "Storage magazine: How to scale up with storage clusters."

Download it now to read this article plus other related content.

Software vs. hardware clustering

Single file system
There are products that provide global file system capabilities for aggregated storage systems like IBM Corp.'s SAN File System (SAN FS). These applications typically run on an appliance or an intelligent switch with client software on supported hosts

Requires Free Membership to View

to deliver one of clustering's key requisites, the global file system.

SAN FS and similar products take a two-pronged approach: They virtualize the storage they sit in front of into a single file system and interface with the hosts' OSes to present that file system as if it were native to the hosts. In this manner, these systems improve capacity management by providing policy-based data migration across all connected storage. That enables more effective storage tiering, a basic step toward an information lifecycle management implementation. SAN FS works with a variety of Windows, Linux and Unix hosts, but requires an IBM storage system for its meta data store. It supports numerous back-end storage systems and can be used in conjunction with IBM's SAN Volume Controller (SVC) to support a range of storage arrays.

Examples of other clustered file system products include Ibrix Inc.'s Fusion, PolyServe Inc.'s Matrix Cluster, Red Hat Inc.'s Global File System (formerly Sistina GFS), SGI's InfiniteStorage Shared Filesystem CXFS and Veritas Software Corp.'s Cluster File System. These are all host-based apps that cluster servers and provide a single image of the storage available attached on SANs.

Clustered file systems are attractive because they can work with installed storage. On the other hand, hardware clustering systems require the purchase of new storage (see Software vs. hardware clustering, this page).

But virtualization and a global file system don't necessarily add up to a fully clustered storage system. Randy Kerns, a senior partner at Evaluator Group Inc., Greenwood Village, CO, describes SAN FS as "a meta data server approach to storage virtualization." It provides a key element, but it's only part of the clustering picture. "It's one way to provide a global namespace," notes Kerns, "but a global namespace and clustered storage are not necessarily connected."

Beyond the file system
A fully clustered storage system goes beyond what the servers and applications see; it provides the underpinnings and infrastructure of the storage system itself. Among available products, the best examples are those that have been built from the ground up to deliver clustered storage. These hardware-based systems address the scalability of physical resources, not just that of the file system. According to Kerns, these systems have an advantage over some of the software-only approaches to clustering. "You're going to put on another layer of software and yet you're still probably going to manage those devices independently," says Kerns.

Some examples of purpose-built clustered storage systems include EqualLogic Inc.'s PS Series, Isilon Systems Inc.'s IQ series arrays, LeftHand Networks Inc.'s SAN/iQ IP SAN and Xiotech Corp.'s Magnitude 3D (see Clustered storage system sampler).

While most midrange storage systems offer a modular approach to growing capacity, clustered systems take the concept a step further. Typically, in a non-clustered midrange array, a module (or expansion unit) is added to increase disk capacity; in some cases, another controller can be added to increase the horsepower of the array. For the most part, these modular midrange arrays can scale capacity, but not performance. "If you're just adding disk, but aren't doing anything about performance," says Kerns, "obviously you'll see some degradation."

In a clustered storage architecture, modules are typically packages that include not only additional disks, but a controller assembly with its own set of interfaces. Building out a clustered array also increases performance and connectivity. Because a full complement of processors, memory, ports and so forth is added with each set of new disks, the performance of a clustered storage system will often scale linearly as it expands. This is in stark contrast to non-clustered modular systems where performance is likely to suffer as disk expansion units are added.

When a module is added to the cluster, the other members of the cluster automatically recognize the new module. The cluster then reorganizes itself to accommodate the added capacity by re-striping data across all disks, sharing data management policies and balancing the workload among all members. Usually, cluster modules interconnect with each other using a Fibre Channel (FC) or Gigabit Ethernet (GbE) interface, although Isilon recently announced it will offer clustered storage systems that use InfiniBand connections, which are approximately 10 times faster than GbE.

Servers connected to the clustered array are unaffected. Typically, there's no need for client software on the host servers, and they can continue to access storage from the pool even as new capacity is added. Within the storage cluster, the specific controller that a host connects to is almost irrelevant, as cluster modules can hand off responsibility for those interfaces to one another to adjust to failures or varying loads and bandwidth requirements.

For cluster modules to interact effectively, their operating systems must be in constant communication. If a unit fails--or shows signs of an impending failure--its processing workload is picked up by other cluster modules and data is transferred from its disks to others, if necessary. This arrangement provides effective failover to ensure availability and, as more modules are added, data protection and availability increases as well.

Most importantly, as modules are added to accommodate new requirements, administration remains constant. Even as the cluster grows, "I can administer it as a single system and don't have to change anything," says Kerns. "I don't have to administer another box."

Sports Illustrated in New York City opted for clustered storage to support its onsite digital photography operations. Phil Jache, deputy director of technology for the magazine, says their three Isilon IQ arrays have been air-shipped to the Olympics, Super Bowl and other major events. The Isilon systems cut two or three hours from the magazine's photo processing time. "It enabled us to do some things that just weren't possible [before]," says Jache.

AccessIT installed a Xiotech Magnitude 3D clustered storage system at its Managed Services Division in New York City and another at its Media Services headquarters in Los Angeles. Erik Levitt, president and COO of the Managed Services Division, says AccessIT installed one of the Xiotech boxes to support its IT services business, which supports clients in 35 countries from 10 data centers. The Los Angeles-based Xiotech system is used primarily for the distribution of digital films, such as I, Robot and Shark Tale, to nearly 30 movie theaters equipped with digital projection systems.

The Xiotech cluster in New York replaced a traditional monolithic SAN. Levitt says the price of the Magnitudes was a major selling point. "We're adding about 5TB at a clip, so scalability is extremely important to us," says Levitt.

This was first published in April 2005

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: