Problem: I need redundancy for a shared file system that is dynamically changed by multiple servers. The files are being served by a single NFS server. I would like to add a second NFS server in hot standby so that if the primary server becomes unavailable, the secondary takes over with a file system that has been kept synced to the primary (via rsync).
I know I can setup autofs with multiple hosts and weighted to favor the primary but if the primary fails and is mounted by the servers the automountd umount fails because the fs is in a busy state therefore it does not mount the fs from the secondary. I believe that if I force a umount (Solaris 8, umount-f) it leaves any files that might have been open in an unstable state so that when the secondary system is mounted the same file is not in the same state as the original, true? So, here is my question. Can I build reliable failover for a read/write file system using NFS and autofs?
Yes, you ought to be able to build HA NAS using a couple different solutions that I'm aware of. As to autofs, I do not know anything about it, so my response below applies to the general concept of HA NFS, without regards to autofs.
The first is a Network Appliance solution using what they call virtual interfaces (VIFs) on a pair of their Filers. Data access is accomplished by shared storage in a SAN that supports both Filers simultaneously. Memory in each Filer is dedicated to cross-system communications with the ability to determine when one of the systems fails. There are several configuration options available that can be reviewed on their Web site.
Beyond Network Appliance, there could be other storage and system vendors who provide HA with failover for NFS. Compaq, HP, IBM, EMC might all have similar capabilities - they certainly understand clustering and clustered storage. Some of these might look more like standard clustered Unix servers than NAS appliances. The mount point should not have to change if the failover happens fast enough. Network routing would have to converge quickly enough to re-establish the application server to secondary NFS server connection.
Finally, another type of solution is one that is based on a distributed file system, such as the Tricord Lunar Flare NAS products. In this case, any of the NAS nodes can fail without losing access to any of the data in the Lunar Flare cluster. You might need to look into this file system to understand how this is so and its way beyond the scope of this vehicle.
Editor's note: Do you agree with this expert's response? If you have more to share, post it in our Storage Networking discussion forum at http://searchstorage.discussions.techtarget.com/WebX?50@@.ee83ce4 or e-mail us directly at email@example.com.