This article can also be found in the Premium Editorial Download "Storage magazine: Five companies on their storage virtualization projects."
Download it now to read this article plus other related content.
Additional policy engine considerations
When automating BC and DR, it's important to know what notification mechanisms exist between different hardware and software technologies that will trigger the automation. For example, in an electrical power grid, there are supervisory control and data acquisition mechanisms to detect, monitor and enable management of the grid. For a data and storage infrastructure to support BC and DR, monitoring, analysis and event-correlation tools provide similar management capabilities, such as those from Aptare Inc. (StorageConsole), CentrePath Inc. (Magellan), EMC Corp. (Smarts), Hewlett-Packard Co. (Storage Essentials), Onaro Inc. (SANscreen) and WysDM Software Inc. (WysDM), among others.
|Policies for business continuity and disaster recovery|
When setting up rules and policies for business continuity and disaster recovery, consider the following:
You should consider adding fault and event-correlation analysis and monitoring tools, and change management software to your automated DR/BC arsenal. Event-correlation tools can be used to report on the general health and status of your storage and data infrastructure, as well as servers and networks in general. In addition, these tools can shed light on how automated data protection tasks such as backup and replication--and storage in general--are performing.
When setting up policies, sometimes it's helpful to construct a decision tree, flow chart or similar mechanism to guide you through the process of data classification and technology assignment. This can be an effective tool for a storage manager to decide which techniques and technologies are most appropriate to use to support the specific level of automation for BC and DR. Essentially, you will create a decision tree that becomes the rules that dictate what actions the various policy managers will take for different events.
Depending on how sensitive your applications are to downtime, you should determine how much of a delay in recovery or restart you can tolerate before a policy manager initiates automated failover, restart or recovery. As part of automated recovery, consider the latency, if any, that exists from the time a policy manager decides to take action and for the action to occur. For example, a policy manager in a storage system may determine that a primary network link has failed, so it will initiate a switch to a slower secondary network link for remote data replication. You should know how much time will elapse between the policy manager's initial determination and the time required to switch to a secondary network path (and perhaps switch from synchronous to asynchronous modes of operation).
Automating BC and DR using policies reduces downtime and errors, but it takes time and planning to decide what to automate and to set up policies and the rules. Automated policies usually don't control only storage, but touch almost everything within the data center: servers, networks, applications, databases and lines of business. It's a complicated and difficult task to automate BC and DR procedures; once done, however, the payback is enormous.
This was first published in August 2006