With pNFS enforcing a standardized approach to parallel file delivery, users will see a NAS performance boost in...
NFS 4.1, no matter whose NAS storage they have deployed.
Late last year, as Network File System (NFS) 4.1 moved from Last Call to Request for Comment status at the Internet Engineering Task Force, there was a lot of press about how Parallel NFS (pNFS), included in the protocol, would create a quantum leap in network-attached storage (NAS) performance for bandwidth-intensive applications like those used in high-performance computing (HPC) applications. NFS 4.1 represents a major step in the journey to enable NFS to better serve the throughput needs driven by ever-increasing file sizes and demanding HPC environments. While the pNFS buzz has died down, vendors are developing solutions that incorporate the standard and pNFS-supporting products will be on the market in approximately six months. The 600-plus-page Request for Comment is in the editing process and full ratification should happen soon.
The Network File System protocol
The NFS protocol enables users to remotely access and share directories and files stored on a central file server or NAS array as if the data were stored on a local disk drive. By deploying dedicated NAS devices, you gain centralized management of storage resources, easier file sharing and collaboration, better data protection and disaster recovery (DR) planning, storage optimization, space savings via quotas and so on. With the Milford, Mass.-based Enterprise Strategy Group (ESG) projecting file-based data making up more than 70% of total storage capacity by 2012, these benefits are becoming more important to storage administrators.
One of the key challenges with NFS is that performance is gated by the bandwidth of the NAS head or processor node that controls, or "owns," the directory and file being accessed. NFS 4.0 limits file ownership to a single node (there are ways to get around single-node ownership, but not without tradeoffs). When a file is requested by a client, all data delivered to the client must be routed through the NAS head. In the meantime, the NAS head is also handling NFS tasks such as locking, permissions and file metadata management. One person accessing a large file can bring the performance of the NAS head to its knees, leaving other users with file shares accessed via that head waiting for their files. This issue is exacerbated in HPC environments, which have experienced a shift to parallel processing where multiple processors accessing shared data can easily overwhelm a single NAS head.
A number of vendors have introduced parallel file serving technology to meet this demand, but adoption of these products has been limited thanks to their proprietary natures and the need to add special clients into the mix. Widespread adoption of parallel file services, if it is indeed going to take off, requires a standard approach. This is where pNFS comes in.
Parallel NFS takes NAS performance up a level. Files can be broken up and striped across NAS heads and, leveraging multiple data paths and processors, delivered in parallel to the requestor to provide a major performance boost. pNFS also introduces the ability to bypass NAS heads for file delivery altogether.
One of the keys to providing parallel data delivery is the addition of an out-of-band metadata server. The metadata server contains a map, referred to in the NFS 4.1 standard as the "Layout," detailing how and where data is stored. The metadata server also handles file semantics and permissions. When a file request is made by a client, that request is routed to the metadata server first. The metadata server returns information to the client about where the file "lives" on the associated file servers, and then the client can get the information directly. If the file is striped across multiple processor nodes, all of the processor nodes can be leveraged to fill the request, providing a boost in both bandwidth and processing power.
pNFS takes the equation one step further than just parallel data delivery over an IP network by introducing support for direct block data access and Object-based Storage Devices, essentially bypassing NAS heads entirely in the delivery of file data. When file access is requested by an authorized client in block data mode, the actual block layout of the file is returned to the requesting client rather than a file layout. The client can then go directly to the storage devices themselves, rather than NAS heads, to get data leveraging the SCSI protocol. In HPC-type environments, where the clients are often servers in the data center, this means they can be connected directly to block storage devices via fast pipes like 10 GbE or InfiniBand, and access files (as block data) via multiple parallel paths -- a huge performance boost vs. accessing shared files over NFS and a single NAS head. A request for an object would follow a similar path.
With pNFS enforcing a standardized approach to parallel file delivery, users will see a NAS performance boost in NFS 4.1, no matter whose NAS storage they have deployed.
BIO: Terri McClure is a storage analyst at Enterprise Strategy Group, Milford, Mass.