Published: 12 May 2003
Seamless, consolidated and efficient management of storage through an advanced shared file system has long been a promise of storage area networks (SANs). Shared file systems--which let hosts share files on a SAN--promise to simplify the management of storage and save money by consolidating storage resources. So, how close are we to that dream? The short answer: There's some progress to report, but to date, not many companies have employed shared file systems.
If you have multiple hosts that need to access a common set of files on a SAN, a shared file system is necessary to coordinate between those hosts. Otherwise, if two systems try to read and write from the same file, it's likely that data corruption will occur. A shared file system coordinates access to a file, and ensures that reads and writes are consistent between the hosts. And if the two hosts use different operating systems, you'll also need a shared file system to handle normalizing file operations between multiple operating systems.
Users are also looking to shared file systems to help solve issues with the speed of accessing data over Ethernet. If you directly connect clients to the SAN through a shared file system, you eliminate the overhead and bottlenecks of transmitting that data over an Ethernet network. That's a technique that works well with large files, where the throughput more than offsets the overhead of the shared file system. In speed-dependent applications such as scientific computing, database clusters or multimedia handling, the additional speed is directly linked to increased performance of those applications.
The use of shared file systems can also significantly ease the amount of storage and handling required for data, particularly when there's a large amount of data which would need to be moved or duplicated, such as in multimedia applications. A shared file system is also a requirement for many high-availability systems, providing a shared storage pool for a failover pair or shared access for scaling an application cluster. Finally, by using a shared file system you can optimize use of your storage, and allocate storage on a finer granularity than disks or LUNs.
Shared file systems aren't a new technology. Systems such as OpenVMS have had clustered filesystem support for years in mainframe and midrange environments. Now with the advent of widely available storage networking equipment, shared and cluster file systems for Unix and Windows server environments are starting to gain acceptance, especially in such data intensive areas of video editing, oil and gas exploration and genomic research applications.
There are several types of shared file systems in use today on SANs, says Philippe Nicolas, SNIA data sharing tutorial manager and SNIA France chairman. Shared file systems can be broadly grouped into three categories. First, there are SAN file systems where access to files on a device is shared, but not the file system itself. The second type is clustered file systems where all nodes understand the file system structure. The third type is shared file systems that are integrated within an application engine, such as Oracle 9i Real Application Clusters (RAC) (see "Shared file systems types").
Typical of a shared SAN file system is IBM's SANergy, which targets multimedia and small- to medium-size workgroups. The solution--which was purchased from Mercury a few years ago--uses a metadata server and presents a network-attached storage (NAS)-like access to systems, using the SAN for large block transfers. "SANergy is an accelerator of network file systems. For someone with a NAS box, SANergy takes advantage of Fibre Channel and splits the control and data path. Control data goes over an IP network; information is shared back to client and actual data I/O goes over the Fibre Channel SAN," says Greg Tevis, one of IBM's software architects for its Tivoli Storage Area Network Manager software.
EMC's HighRoad solution also provides a combined NAS/SAN approach to shared file systems. Paul Ross, director of storage network marketing at EMC says, "Two years ago, we released a product called HighRoad. It enables file sharing between a bunch of servers, but they don't have to access the file system through the NAS device." Using the EMC's network-attached storage heads in its NS600 servers which contain HighRoad drivers and a host bus adapter (HBA),the EMC servers can access a volume via NAS over Ethernet, using the SAN for high speed, direct access for large block transfers.
Clustered file systems
Unlike SAN file systems, clustered file systems mount an entire volume on the nodes in the cluster. Clustered file systems work by joining a set of servers together in tight coordination, allowing them to share and access common files over a SAN. When a client requests to read or write a file, the file system drivers determine if another user is currently reading or writing a block of data through a locked server. If not, the client locks the file, directly accesses the data through the SAN and holds that lock until a read or write is completed. This coordination ensures what is written to disk in the SAN is always consistent.
Advanced Digital Information Corp.'s (ADIC) StorNext file system is one of the original shared file systems to run on a SAN. Bill Yaman, VP of software at ADIC, says StorNext is a heterogeneous file system designed for data-intensive SAN environments.
IBM's StorageTank is also a clustered file system. Unlike SANergy, StorageTank is focused on providing strategic, enterprise-level reliability and features in a clustered file system. Tevis describes the difference, saying SANergy isn't an enterprise-level generic global SAN file system. It's a department or area file sharing solution with file system limitations in terms of performance and scalability. SANergy can only support hundreds of clients. According to Tevis, StorageTank can support "tens of thousands of clients."
Start-up Sanbolic Inc., Watertown, MA, also offers a fully clustered file system, with initial availability of Windows support. The company also says that its architecture will support Unix in the future.
Distributed computing model
Some companies are moving away from expensive, proprietary systems to low-cost Linux clusters. Minneapolis, MN-based Sistina is a good example of this. Its Global File System (GFS) is an outgrowth of a project at the University of Minnesota. According to Joaquin Ruiz, VP of product management, "With Sistina, you can just add bricks of compute power without having to do a forklift upgrade."
Sistina and other companies are hoping to ride the conversion of large midrange and mainframe applications to Linux, where a shared file system can be used between storage and low-cost Linux servers to help connect a database, scientific or custom application cluster. Shared file systems allow you to add storage or servers as capacity is needed, instead of doing a big upgrade of a central server.
PolyServe, Beaverton, OR, is another company with a build-as-you-need philosophy. For example, Steve Norall, director of product marketing at PolyServe, says, "Matrix Server is targeted at Global 2000 data centers that are focused on building highly available, scalable Intel-based server farms." According to Norall, the product is a fully symmetric cluster file system with a lock and metadata manager.
IBM also offers a single-platform clustered file system, its General Parallel File System (GPFS), designed and used primarily for parallel computing. IBM's Tevis says, "GPFS is focused on a different set of applications than StorageTank. GPFS is ideal for environments like scientific computing where a clustered file system with high performance for parallel access is desired." (See "File sharing product roundup")Clifford Baeseman, Linux administrator at Greenheck Fan Corporation in Schofield, Wisconsin, a manufacturer of ventilation equipment, is using Sistina's GFS on a 1.5TB SAN. Greenheck is running several clustered file systems--one with Sistina GFS, one running Oracle RAC with raw device--and one running Oracle Cluster File System (OCFS), as well as an AlphaVMS cluster they've been running since 1986. "We're running Linux Terminal Server Project, serving X-windows desktops from a single machine out to our manufacturing floor. One machine services 70 desktops," Baeseman says, adding, "If that goes down, all manufacturing stops. Our need for clustering drove us to migrate to a SAN." One of the primary reasons Baeseman is using file sharing software is to ensure the high availability of his servers.
Baeseman looked at several different clustered file systems, including PolyServe, which he didn't pick because "we didn't like the stability yet." He's now running Sistina's GFS, which passed his "brutality testing," which consists of scripts which access the same files from multiple systems and check for data corruption. However, Baeseman actually prefers open-source (GPL) solutions, so he can switch over to OCFS as it matures.
Greenheck has been slowly converting their systems over to Linux-based clusters attached to their SAN. Schreiber explains, "The goal of this project is to get the same stability as Digital Alpha VMS." He's been happy with the solution, saying that the Linux clusters are bringing "significantly lower cost than traditional mainframe product suites."
Despite the promised advantages of shared file systems, users have been slow to adopt the technology. IBM, although continuing to sell and support SANergy, indicates the future lies with its StorageTank products. Similarly, Veritas has refocused its cluster file system offerings to target a few specific markets, such as the Oracle RAC.
When asked how EMC's HighRoad solution is selling, Ross admits "adoption is modest," adding, "it's a new technology and people need to understand what it's used for." Ross says the high cost of SAN equipment is slowing the adoption rate of all file sharing products. "You need a SAN to get the benefits," he says.
Werner Zurcher, a product manager at Veritas agrees that the expense of a SAN is currently a significant barrier to adoption. He says, "Practically speaking, cluster file systems require a SAN, and currently there are a limited set of customers who are willing to pay extra money for the SAN infrastructure needed to connect multiple systems to some shared disks." He explains, "The reason clustered file systems have been successful in some vertical markets--such as geophysics and multimedia--is because the application files in those market segments tend to be large enough that going to a clustered file system makes a big performance difference. File sharing via Ethernet provides reasonable I/O performance for small files, but it does not scale very well for large files."
IBM's Tevis is bullish on the future of shared file system technology. He says clustered file systems will continue to improve and win market acceptance. As he put it: "Clustered file systems are not going away, and the industry is heading more and more in this direction."