Tip

Availability, part 8: Avoid data corruption with hard mounts in NFS

In previous tips in this series, we introduced the idea that implementing availability requires taking a layered approach, and then following that layered approach, we looked at good system administrative practices, backups, disks and storage (and why larger disks aren't always better), networking, and system's local environment. The seventh level in the Availability Index (introduced in

    Requires Free Membership to View

part one) addresses applications and services. The service we're going to look at today is the Network File System or NFS. A widely adopted protocol for sharing files from servers to clients, NFS is available in one form or another on most of the major platforms that are found in today's enterprise.

The issue I want to discuss this month is the use of soft mounts with NFS. Many system administrators believe that soft NFS mounts are their friend. In fact, soft NFS mounts should be avoided at pretty much all costs.

When an NFS client mounts an NFS file system from a server, one of the mount-time options is whether the mount should be soft or hard. Soft and hard only come into play when the server that is providing the mount stops responding. With a soft mount, after a settable timeout (usually lasting several minutes) has passed, any operations that were trying to read or write to the file system will give up and fail. When a mount is set to be hard, it will never time out. And disk operations are not interruptible; a control-C will not cause the operation to stop trying. Instead, when hard mounted, the operation will continue until it completes, or until the client attempting the operation is shut down.

The apparent advantage, then, is to soft mounts, since the client system cannot get hung as a result of a failure on its NFS server. That is, unfortunately, a shortsighted and incorrect point of view.

The truth is that if an NFS write can time out without completing, the result can be data corruption. Consider the following scenario of events:

Write 1: succeeds
Write 2: succeeds
NFS server crashes
Write 3: fails, due to time out
Write 4: fails, due to time out
NFS server recovers (very quickly)
Write 5: succeeds
Write 6: succeeds

When a user later attempts to read the data from the file with the failed writes, he'll be reading until he runs into the failed writes. Those positions in the file will have no data in them (or more accurately garbage data). They are, effectively, holes. When the application reads them, and attempts to act on the data, the application will surely fail, and depending on the nature of the application, it could take the whole system with it. What's more, the data that was supposed to be in the hole is lost; unless other steps were taken, it cannot be retrieved.

The write errors that come from failed soft mounted NFS file systems can be detected, but most developers do not write their code to check error codes from every single write.

Hard mounts may appear to be inconvenient because failures and timeouts cannot be interrupted. That lack of interruption is exactly what you want; it ensures that data gets written when and where you expect it written, and that failures and write errors get caught. Write errors cannot be glossed over, and left for discovery by production applications later on.


Copyright 2003, Evan Marcus

Evan L. Marcus is Data Availability Maven at VERITAS Software. Contact him at evan@veritas.com.


This was first published in January 2003

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.