Feature

File Systems: The state of the art

Ezine

This article can also be found in the Premium Editorial Download "Storage magazine: Survey says storage salaries are climbing."

Download it now to read this article plus other related content.

Journaling
File systems have evolved beyond the traditional file systems that commonly ship with workstation computers and servers. Journaling is one of the most basic and widespread improvements for traditional file-system architectures. In the event of an internal file-system error or unanticipated system shutdown, a traditional file system must rely on a time-consuming reboot with a granular data scan to recover itself (for Unix and Linux, the fsck command examines and repairs the file system). For larger deployments, this can translate into hours before the file system can assess its integrity and come back online. Needless to say, such downtime in critical server environments is now completely unacceptable.

Requires Free Membership to View

How to choose a file system
Whether you're looking for a scalable NAS solution, a SAN cluster, a high-performance cluster or a wide-area deployment, there are certain factors every file system needs to offer. In any scenario, the following criteria will determine why one file system is a better fit than another for a particular environment or storage application:

Workload. The data workload will have a profound impact on the kind of file system deployed. Some file systems can't perform well under dynamic, random I/O workloads (e.g., databases), but excel in sequential data environments (e.g., digital content and streaming media.) There's no perfect file system that's optimized across all workloads.

Scalability. Scalability issues include how a file-system technology handles the addition of new clients, servers, applications, networking elements and storage capacity. Finding the appropriate mix for a given deployment requires careful analysis. Vendors may have developed offerings that excel in one or two aspects of scalability (e.g., addition of computing and storage resources), but can't handle application loads at that scale.

Application goals. The kinds of applications to be supported will be one of the key determinants for selecting a file-system technology. For example, does the file system provide dynamic access to critical Oracle data stored within a SAN environment; provide a small number of Unix clients access to a high-performance computing and storage pool; or does it provide 80 Windows clients with a unified view of their network directories? The application goals for each of these examples will lead to markedly different vendor choices and file-system architectures.

Performance. Some sophisticated file systems support near-linear performance as new nodes are added to clusters, providing massively parallelized, end-to-end performance. Others degrade in performance quickly as new nodes are added, but provide superior capabilities for data sharing or collaboration across wide areas. All too often, users choose inadequately powered file systems or overspend on complicated architectures that could have been avoided.

When pressed, most file-system vendors will admit that publicly stated performance metrics provide only the most basic outline of a product's suitability for a given environment. Any reputable file-system vendor will support actual testing of their file system in a real-world environment and encourage prospective buyers to speak with others who have deployed their technology for similar workloads.

Because of the criticality of enterprise data today, most data centers deploy a journaling file system (JFS). It's easiest to think of journaling as a rapid backup and recovery mechanism for the file system. A fully implemented JFS creates its own transaction logs for all meta data and user data actions within a file system. By logging file meta data and user data, a JFS can determine precisely what transactions had taken place up to the time of the failure, thereby ensuring full data integrity when the journal is replayed. The file system can then use the journal information to execute an immediate recovery, avoiding a time-consuming walk of the entire file system's data structure. This reduces recovery time from hours to seconds or minutes.

Journaling has become a standard feature of advanced file systems and one of the cornerstones of high-availability initiatives. It's found in the most popular enterprise file systems that run on Unix and Linux platforms, including ext3, XFS, ReiserFS, VxFS and IBM Corp.'s JFS. Microsoft Corp.'s current NTFS release supports meta data journaling capabilities, with a more feature-rich Transactional NTFS version called TxF due in its Vista (formerly code-named Longhorn) operating system release.

Networked file systems
Networked file systems have become the foundation for data and resource sharing in the enterprise. Not surprisingly, the iconic deployment example is NFS, a 20-year-old technology now synonymous with the protocol it exports to drive a majority of the world's NAS deployments. The other major networked file system is the Common Internet File System (CIFS). At their core, both NFS and CIFS are hierarchical file systems that can export specialized protocols to their clients to enable the sharing of files under their control. NFS and CIFS support Unix and Windows clients, respectively. Despite its strong associations with the Unix community, NFS can support other operating systems, including Windows.

Networked file systems are at the center of developments in namespace aggregation, as described above. NFS has continued to add functionality in this respect, including inherent namespace aggregation across multiple machines in NSF V.4, its latest release. With namespace aggregation activated, multiple machines running NSF V.4 can share a common view of file information. Likewise, CIFS can leverage a software platform called Microsoft Distributed File System (DFS) to establish unified namespaces across multiple Windows machines. Despite the "distributed file system" moniker, Microsoft DFS isn't a true file-system technology, but rather a namespace aggregation tool that deploys atop the Windows operating system.

Storage professionals may note that the term virtual file system (VFS) is used increasingly by some vendors. Users should consider the term VFS a synonym for namespace aggregation, and think of it in the context of a networked file system. Specifically, this refers to grouping several file systems to create the virtual image of one file system. Underneath, each physical device retains its own file-system images.

There are several companies developing technologies on top of these networked file systems to enhance namespace management, including Acopia Networks Inc., NeoPath Networks Inc., NuView Inc. and Rainfinity (acquired by EMC Corp.).

This was first published in November 2005

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: