This article can also be found in the Premium Editorial Download "Storage magazine: Upgrade path bumpy for major backup app."
Download it now to read this article plus other related content.
The DR plan and testing needs to consider LANs, WANs of connected sites, and Internet connectivity and remote user access, mostly in the form of client virtual private networks (VPNs). Unless prohibited by budgetary restrictions, all mission-critical network connections should be designed redundantly. For LANs and SANs, especially in the data center, this means having servers and storage arrays redundantly attached to dual-core network switches and Fibre Channel (FC) switches, and using dynamic routing protocols like Open Shortest Path First (OSPF) for IP networks, multipath I/O for FC-attached storage and port trunking for ISL links to perform automatic failover. WAN and Internet connection redundancy requires two carriers or leveraging site-to-site VPNs to back up private-line WAN circuits, harnessing dynamic routing protocols like Border Gateway Protocol (BGP-4) or OSPF to perform the automatic failover.
Testing redundant network connection failover is relatively straightforward and can be as simple as forcing a manually induced failure of the primary link by disabling a port on a switch or router. Unless flawed, dynamic routing should redirect traffic through the redundant circuit without any disruption. Due to the complex nature of routing, it's highly recommended to verify failover beyond simply accessing resources on the remote site. Tools
like traceroute, network mappings, and topology graphing tools in storage and network management apps verify proper failover.
Redundancy can also breed complacency and create a false sense of safety. A DR rehearsal of the network needs to contemplate different scenarios so you don't fall into common traps. An important aspect of redundant network connections is the dependency between primary and failover connections. Commonalities need to be clearly identified, as they present a single point of failure. For instance, the resilience of two Internet connections from two different carriers using BGP-4 routing for automatic failover is compromised if the wiring for both connections enters the building through the same minimum point of entry. Similarly, having the primary and failover connection from the same carrier is problematic in case the carrier experiences difficulties.
DR testing of network connections without redundancy is more difficult, and comprises a combination of component testing and process, documentation and service-level agreement (SLA) verifications. It begins with verifying the process for replacing defunct switches and routers, including testing the configuration and proper operation of spare equipment, as well as the procedure for getting replacement hardware. Vendor agreements, contact information and SLAs need to be verified and at least partially tested.
Finally, the network DR rehearsal needs to account for changes in network load in case of a disaster. Typically, a failover circuit is a lower cost alternative to the primary connection such as a VPN or frame-relay circuit. The DR test must ensure that the network connections can deal with the increased network load during a disaster. Some DR plans count on users connecting from home via VPN client software to conduct business. Under normal circumstances, only a relatively small fraction of the employee community connects via VPN; that number will grow substantially during a disaster. The DR test must ensure the VPN server can deal with the increased load and that there's plenty of Internet bandwidth to cope with the bandwidth surge caused by the increase in VPN usage.
This was first published in September 2006