This article can also be found in the Premium Editorial Download "Storage magazine: Distance: the new mantra for disaster recovery."
Download it now to read this article plus other related content.
If you have multiple hosts that need to access a common set of files on a SAN, a shared file system is necessary to coordinate between those hosts. Otherwise, if two systems try to read and write from the same file, it's likely that data corruption will occur. A shared file system coordinates access to a file, and ensures that reads and writes are consistent between the hosts. And if the two hosts use different operating systems, you'll also need a shared file system to handle normalizing file operations between multiple operating systems.
Users are also looking to shared file systems to help solve issues with the speed of accessing data over Ethernet. If you directly connect clients to the SAN through a shared file system, you eliminate the overhead and bottlenecks of transmitting that data over an Ethernet network. That's a technique that works well with large files, where the throughput more than offsets the overhead of the shared file system. In speed-dependent applications such as scientific computing, database clusters or multimedia handling, the additional speed is directly linked to increased performance of those applications.
The use of shared file systems can also significantly ease the amount of storage and handling required for data, particularly when there's a large amount of data which would need to be moved or duplicated, such as in multimedia applications. A shared file system is also a requirement for many high-availability systems, providing a shared storage pool for a failover pair or shared access for scaling an application cluster. Finally, by using a shared file system you can optimize use of your storage, and allocate storage on a finer granularity than disks or LUNs.
Shared file systems aren't a new technology. Systems such as OpenVMS have had clustered filesystem support for years in mainframe and midrange environments. Now with the advent of widely available storage networking equipment, shared and cluster file systems for Unix and Windows server environments are starting to gain acceptance, especially in such data intensive areas of video editing, oil and gas exploration and genomic research applications.
There are several types of shared file systems in use today on SANs, says Philippe Nicolas, SNIA data sharing tutorial manager and SNIA France chairman. Shared file systems can be broadly grouped into three categories. First, there are SAN file systems where access to files on a device is shared, but not the file system itself. The second type is clustered file systems where all nodes understand the file system structure. The third type is shared file systems that are integrated within an application engine, such as Oracle 9i Real Application Clusters (RAC) (see "Shared file systems types").
Typical of a shared SAN file system is IBM's SANergy, which targets multimedia and small- to medium-size workgroups. The solution--which was purchased from Mercury a few years ago--uses a metadata server and presents a network-attached storage (NAS)-like access to systems, using the SAN for large block transfers. "SANergy is an accelerator of network file systems. For someone with a NAS box, SANergy takes advantage of Fibre Channel and splits the control and data path. Control data goes over an IP network; information is shared back to client and actual data I/O goes over the Fibre Channel SAN," says Greg Tevis, one of IBM's software architects for its Tivoli Storage Area Network Manager software.
EMC's HighRoad solution also provides a combined NAS/SAN approach to shared file systems. Paul Ross, director of storage network marketing at EMC says, "Two years ago, we released a product called HighRoad. It enables file sharing between a bunch of servers, but they don't have to access the file system through the NAS device." Using the EMC's network-attached storage heads in its NS600 servers which contain HighRoad drivers and a host bus adapter (HBA),the EMC servers can access a volume via NAS over Ethernet, using the SAN for high speed, direct access for large block transfers.
Clustered file systems
Unlike SAN file systems, clustered file systems mount an entire volume on the nodes in the cluster. Clustered file systems work by joining a set of servers together in tight coordination, allowing them to share and access common files over a SAN. When a client requests to read or write a file, the file system drivers determine if another user is currently reading or writing a block of data through a locked server. If not, the client locks the file, directly accesses the data through the SAN and holds that lock until a read or write is completed. This coordination ensures what is written to disk in the SAN is always consistent.
Advanced Digital Information Corp.'s (ADIC) StorNext file system is one of the original shared file systems to run on a SAN. Bill Yaman, VP of software at ADIC, says StorNext is a heterogeneous file system designed for data-intensive SAN environments.
IBM's StorageTank is also a clustered file system. Unlike SANergy, StorageTank is focused on providing strategic, enterprise-level reliability and features in a clustered file system. Tevis describes the difference, saying SANergy isn't an enterprise-level generic global SAN file system. It's a department or area file sharing solution with file system limitations in terms of performance and scalability. SANergy can only support hundreds of clients. According to Tevis, StorageTank can support "tens of thousands of clients."
Start-up Sanbolic Inc., Watertown, MA, also offers a fully clustered file system, with initial availability of Windows support. The company also says that its architecture will support Unix in the future.
This was first published in May 2003