Editor's note: In earlier installments of this series, the authors described the specific criteria an effective failover architecture should be able to meet. In this last installment, they spend time describing the critical elements you need to move to the backup or takeover server.
When a failover occurs, three critical elements must be moved from the failed server to the takeover server in order for users to resume their activities and for the application services to be considered available once again:
- Network identity. Generally, this means the IP address that the server's clients use. Some network media and applications may require additional information to be transferred, such as a MAC address. If the server is on multiple networks or has multiple public network identities, it may be necessary to move multiple addresses.
- Access to shared disks. Operating system, and in particular filesystem, technology generally prohibits multiple servers from writing to the same disks at the same time for any reason. In a shared disk configuration, logical access must be restricted to one server at a time. During a failover, the process that prevents the second machine from accessing the disks must reverse itself and lock out the original server, while granting access only to the takeover server. Note that not all operating systems provide this technology.
- Set of processes. Once the disks have migrated to the takeover server, all the processes associated with the data must be restarted. Data consistency must be ensured from the application's perspective.
The collection of these elements is commonly called a service group. A service group is the unit that moves from cluster member to cluster member. Sometimes called a virtual machine, the service group provides the critical services that are being made highly available.
A cluster may have multiple service groups, and depending on the software that manages the cluster, there may not be any limit to the number of service groups. Service groups must be totally independent of each other, so that they can live on any eligible server in the cluster, regardless of what the other service groups might be doing. If two or more service groups cannot be independent of each other (that is, they must be together on the same server at all times), then they must be merged into a single service group.
Content in this tip has been excerpted by permission from the book, "Blueprints for high availability, Second edition," authored by Evan Marcus and Hal Stern, Wiley Publishing, Inc. All rights reserved.
About the authors: Evan Marcus is a frequent SearchStorage.com contributor and an expert at answering readers' questions related to availability, backup and disaster recovery-related issues. He is also a principal engineer for Veritas Software and the industry's data availability maven, with over 12 years of experience in this area. He is also a frequent speaker at industry technical conferences.
Hal Stern is the vice president and chief technology officer for the Services business unit of Sun Microsystems. He has worked on reliability and availability issues for some of the largest online trading and sports information as well as several network service providers.
Do you have a question for Evan Marcus? You can find him in our High Availability category.