Clustered storage systems run on storage servers, NAS gateways and hosts. Here's how to determine which clustered...
file-system architecture is best for your needs and storage environment.
Clustered file systems (CFS) offer a practical way to respond to big storage problems such as the proliferation of low-cost servers, application data growth and the need to deliver better application performance. A CFS pulls together and shares the excess storage capacity that's often available but hidden on storage networks. In doing so, a CFS increases storage utilization rates, delivers performance typically found only in high-end arrays and gives users an economical way to scale their architectures.
There are three ways to deploy a CFS: on storage servers, NAS gateways and hosts. Any server in the cluster can access any block of storage managed by the cluster. Most CFS also integrate the volume manager with the file system. This allows the CFS to break large files into blocks called extents, and to stripe those extents across different storage arrays to improve I/O performance.
There are several key questions that need to be answered before selecting a CFS:
- Can the CFS make use of existing storage and network resources?
- How difficult is it to install and configure?
- How does the CFS manage data integrity?
- Can it scale performance and capacity linearly and independently?
- What problem is the CFS best suited to solve?
Clustered storage systems
Clustered storage systems are composed of bricks (servers preconfigured with set amounts of CPU, cache and storage) or blades. Each brick is loaded with the vendor's CFS software that controls and shares the processing memory and storage resources of the bricks in the cluster; blades are managed by an external server that contains the CFS.
Isilon Systems Inc.'s IQ storage clusters use storage bricks and its CFS called OneFS, which combines four layers of storage management software--file systems, volume management, data protection and high availability--into one logical file system. This integration allows OneFS to configure storage on any of the up to 88 bricks it supports in its clusters and to create volumes up to 528TB. Isilon also gives users the ability to choose between bricks of different sizes, ranging from 1.9TB to 6TB raw. While each brick supports only 12 serial ATA (SATA) disk drives, by offering bricks with different size disk drives, Isilon lets users select bricks that meet specific app performance requirements.
Terrascale Technologies Inc. also uses storage bricks, but places its TerraGrid Cluster File System (CFS) on the clients accessing the bricks. Terrascale built its TerraGrid CFS based on the open-source XFS file system; it's a parallel file system that allows apps running in parallel to simultaneously access the same files. TerraGrid CFS scales to support hundreds of nodes, and lets a server read or write data to any node. However, TerraGrid CFS is available only for Linux servers. Windows or other Unix servers that need to access the storage pool have to go through a Linux NAS gateway that contains TerraGrid CFS.
Panasas Inc.'s ActiveScale Storage Cluster is architected in a manner similar to that of TerraGrid CFS, but it also has some unique characteristics. Like Terrascale, Panasas places agents on all clients accessing its storage, directly supports only Linux servers and allows multiple clients to access back-end storage. But Panasas uses Panasas StorageBlades that hold two 400GB SATA drives each. These drives are virtualized by Panasas DirectorBlades that stripe the data across the StorageBlades. DirectorBlades cluster together to create one "virtual" NFS and CIFS server that can scale I/O at high performance levels.
But problems with the clustered storage systems architecture may surface as more bricks are added to the cluster. It's the responsibility of the CFS to manage each additional module's processor, cache and storage capacity. Failing to keep the cache coherent across the bricks can result in file corruption; however, keeping the cache coherent among all of the bricks generates a lot of chatter and degrades the overall performance of the cluster.
Isilon deals with this issue by designating two or more of its bricks as "owning" bricks for each specific file. Keeping the cache consistent in only a few bricks eliminates much of the chatter among bricks. If the request for a file is received by a brick other than the owning brick, the CFS redirects the request to the owning brick. Once the owning brick receives the request, it directs the CFS to distribute the data writes evenly across all of the storage bricks instead of just the disk drives in the owning bricks.
Isilon's approach meets most application requirements when just a few servers need to access large files sequentially, but this technique falters as the number of servers that need to access data concurrently on multiple bricks grows. In that scenario, the owning bricks wouldn't be able to expeditiously handle all of the redirects coming from the other bricks and performance would degrade.
To avoid this problem, Terrascale's TerraGrid CFS allows any server in the compute cluster to access any data block directly on any brick at any time. This approach eliminates the need for cache coherency among the bricks or for the CFS to add any meta data to the file because the file is locked while the server is directly accessing the blocks of the file.
But none of these products overcomes the two main problems of CFS platforms. First, although SATA drives are well suited for the sequential data access required by applications with large amounts of digital content (such as audio, video and graphics), when used in environments with large amounts of random reads and writes, SATA drive performance is significantly lower than that of higher performing Fibre Channel (FC) drives. The other problem is that these systems don't let users redeploy storage they already own. If you have installed storage you want to use in a cluster, CFS architectures that reside on NAS gateways or client servers should be considered.
Clustered NAS gateways
Clustered NAS gateways are servers that sit in the data path between client servers and the storage arrays they access; the gateway acts as one logical server. A CFS clusters together the different NAS gateway servers so that each gateway can access storage anywhere in the cluster. This configuration allows the use of installed storage resources and offers more options to independently scale storage capacity. How well each product does this largely depends on how the vendor has implemented its CFS to manage cache coherency among the different nodes.
The CFS that runs on Exanet Inc.'s ExaStore NAS Gateway uses a control node to minimize the amount of communication that needs to occur among servers in the cluster. When a file is created, one node in the NAS gateway cluster assumes responsibility for that file and breaks it into 1MB chunks called extents. The owning node stores a small amount of meta data in each extent that indicates it's that file's controlling node. When a request to read that file occurs, the node receiving the request reads the file's meta data and determines which node is the control node for that file. The request is then redirected to the control node, which then coordinates the processing of that request.
Exanet's architecture is similar to clustered storage systems that use a parallel file system. Because ExaStore stores the file in 1MB extents, this permits the controlling node to engage other nodes to read the 1MB file extents in parallel more quickly. The other nodes then send their reads to the controlling node, which aggregates all of the 1MB extents into the original file format. Once the file is assembled, it sends the file to the node that received the client request. This node then presents the file to the client.
The CFS on ONStor's Bobcat Series NAS Gateway seeks to avoid the whole problem of cache coherency by turning off the write cache in its clustered servers. Turning off the write cache forces all writes to go directly to back-end storage. This puts a lock on the file as writes occur, and prevents reads or writes on other clustered servers from taking place until the write is complete. This approach works reasonably well for computing environments where different files are accessed randomly by different clients. And because ONStor supports multiple storage arrays from different vendors, users can match each file's performance and availability characteristics to the back-end storage.
But clustered NAS gateway architectures can only be deployed in circumstances where clients will access files over Ethernet interfaces using NFS or CIFS protocols. Using these protocols introduces overhead on both the requesting server and the NAS gateway server processing the request. While additional servers can be added to the cluster to provide the additional cache and CPU needed to handle these requests, it still isn't likely to satisfy the most performance-intensive, random-read apps that need to share files among the same or different operating systems. In these circumstances, users will need to look to a CFS that operates at the host level.
|Alternative storage clustering methods|
Not every vendor is choosing to cluster storage using clustered file systems. Here are some other ways vendors are clustering storage on the back end.
Hitachi Data Systems (HDS) Tagma-Store Universal Storage Platform. Hitachi's TagmaStore provides a common platform into which cards may be inserted to access a pool of shared storage. Card options include Fibre Channel, FICON and ESCON port cards, and iSCSI and NAS blades. This approach allows all storage to be managed at the block level through the same interface and with the same volume manager. HDS also gives users the flexibility to virtualize the storage pool. However, the NAS blades don't yet include any native method to share data at the file level, so third-party products that provide global name spaces are required.
Network Appliance (NetApp) Inc. Data Ontap GX. NetApp's Data Ontap GX allows users to create one logical group of NetApp filers. With Data Ontap GX, any filer can receive a client request for a file and redirect that request to the filer actually containing the file. This lets users add new filers to an existing NetApp filer installation, as well as share their resources without doing a forklift upgrade or introducing a NAS gateway. NetApp's optional FlexVol feature allows users to stripe data across all of the nodes to improve data performance and availability.
Pillar Data Systems Inc. Axiom Storage System. Pillar's Axiom architecture clusters its storage controllers, called Axiom Slammers, which serve as a gateway to its back-end disk and may be configured for SAN or NAS. Although each Axiom system supports only four Axiom Slammers, and each Axiom Slammer is configured in an active-active configuration, Pillar gives users the option to scale out the NAS Axiom Slammers by using their scalable file-system and global name space options.
Host-based clustered file systems
Clustered file systems that operate at the host level provide some distinct advantages over clustered storage systems and NAS gateway configurations:
- There's no need to purchase proprietary storage systems.
- They work in most mixed-vendor environments.
- There's no need to use a mix of file- and block-based protocols.
- The performance overhead associated with processing NFS and CIFS is minimized.
To control access to the files and maintain their integrity, SGI uses a meta data server for each CXFS file system. This requires each server in the cluster to communicate with the meta data server over a TCP/IP link. Even though the amount of meta data traffic sent over this TCP/IP link is minimal, users may not want to put this meta data server on the same physical network to help minimize network collisions and provide higher uptime. Users in highly available environments may want to consider building another physical network and clustering two meta data servers--an expensive and complicated configuration--so the failure of a single meta data server doesn't bring down the entire cluster (see "Alternative storage clustering methods," this page).
Benefits vs. complexity
A CFS gives users the option to cluster storage at the host, network or array level to share file access and maximize storage utilization. Each clustering architecture has its limitations (see "Clustered file-system architecture pros and cons," below), such as drive types or the need to install agents on hosts.
Assess your current storage environment and processing needs before jumping on the CFS bandwagon. Though the benefits may be substantial, you should also expect your storage environment to become more complicated and perhaps more difficult to manage.