File Systems: The state of the art


This article can also be found in the Premium Editorial Download "Storage magazine: Survey says storage salaries are climbing."

Download it now to read this article plus other related content.

DFS and PFS are receiving a lot of attention across a range of applications. The terms distributed and parallel are interchangeable; some vendors call their product distributed while others go with the "parallel" moniker--the two approaches are architecturally and functionally analogous.

Requires Free Membership to View

DFS/PFS enables thousands of servers to sustain parallel I/O into a file system, directory or single file with minimal coordination required between those servers. All DFS/PFS are two-layer, file-system architectures with clients and servers. On the client layer, the DFS/PFS creates a namespace that spans all of the machines and creates a single file-system presentation. Because it establishes "one big file system," the client layer enables any client to make requests into the cluster that's executed by the server layer.

The server layer of a DFS/PFS is responsible for all I/O operations, and can span to more than 2,500 clients in large implementations (see "Verify vendor performance claims," previous page). From a data storage perspective, the server layer of a DFS/PFS is functionally identical to the storage layer, sometimes even referred to simply as the system's storage nodes. This is because every DFS/PFS is architected so that each individual physical server maintains ownership of its own storage resources. In a DFS or PFS, storage isn't directly shared by other servers, as is the case in a CFS. Because of that difference, a DFS/PFS doesn't need to use SAN networks for storage.

A DFS/PFS uses various internode daemons, meta data and data control mechanisms to ensure that stored content is accessed only by a single client at any given time, ensuring data coherency. While some approaches use a centralized lock manager and meta data server to achieve this traffic cop control, others use non-hierarchical or segmented lock management approaches to achieve extremely high scalability and parallelized I/O. The result is a file-system architecture optimized for huge throughput across many machines.

Because of this architecture, DFS/PFS made an initial beachhead in HPC cluster applications. DFS/PFS are being increasingly deployed in enterprises for data-intensive apps such as digital content delivery and scalable NAS (see "Where CFS and DFS/PFS fit best," above right). A shortcoming of most DFS/PFS implementations has been an inability to handle the random I/O common in workloads such as databases. This relative weakness results from how these architectures handle a bunch of traffic cop issues among large numbers of server I/O nodes.

Notable companies leveraging DFS or PFS technologies include Exanet Inc., IBM's General Parallel File System (GPFS), Ibrix Inc., Isilon Systems Inc. and Lustre, the open-source initiative.

This was first published in November 2005

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: