This article can also be found in the Premium Editorial Download "Storage magazine: Survey says storage salaries are climbing."
Download it now to read this article plus other related content.
DFS and PFS are receiving a lot of attention across a range of applications. The terms distributed and parallel are interchangeable; some vendors call their product distributed while others go with the "parallel" moniker--the two approaches are architecturally and functionally analogous.
The server layer of a DFS/PFS is responsible for all I/O operations, and can span to more than 2,500 clients in large implementations (see "Verify vendor performance claims," previous page). From a data storage perspective, the server layer of a DFS/PFS is functionally identical to the storage layer, sometimes even referred to simply as the system's storage nodes. This is because every DFS/PFS is architected so that each individual physical server maintains ownership of its own storage resources. In a DFS or PFS, storage isn't directly shared by other servers, as is the case in a CFS. Because of that difference, a DFS/PFS doesn't need to use SAN networks for storage.
A DFS/PFS uses various internode daemons, meta data and data control mechanisms to ensure that stored content is accessed only by a single client at any given time, ensuring data coherency. While some approaches use a centralized lock manager and meta data server to achieve this traffic cop control, others use non-hierarchical or segmented lock management approaches to achieve extremely high scalability and parallelized I/O. The result is a file-system architecture optimized for huge throughput across many machines.
Because of this architecture, DFS/PFS made an initial beachhead in HPC cluster applications. DFS/PFS are being increasingly deployed in enterprises for data-intensive apps such as digital content delivery and scalable NAS (see "Where CFS and DFS/PFS fit best," above right). A shortcoming of most DFS/PFS implementations has been an inability to handle the random I/O common in workloads such as databases. This relative weakness results from how these architectures handle a bunch of traffic cop issues among large numbers of server I/O nodes.
Notable companies leveraging DFS or PFS technologies include Exanet Inc., IBM's General Parallel File System (GPFS), Ibrix Inc., Isilon Systems Inc. and Lustre, the open-source initiative.
This was first published in November 2005