This article can also be found in the Premium Editorial Download "Storage magazine: Storage managers give thumbs up to IP storage."
Download it now to read this article plus other related content.
|DR is too complex|
Rank your satisfaction with your tape backup and recovery system's complexity level:
Rank your satisfaction with your disk-to-disk split mirror system's complexity level:
Source: Dragon Slayer Consulting
New DR solutions
Many storage vendors are bringing innovative products to the market that aim to reduce the cost and complexity of DR and business continuity while increasing its efficiency. The most promising of these technologies are:
- Continuous replication and continuous snapshot
- Server-to-server storage volume replication
- TCP/IP WAN acceleration for storage applications
- Virtual tape automation
Continuous replication and snapshot works simply and elegantly (see "Continuous replication appliance"). It's commonly deployed in a storage appliance (a remote access server) with no single point of failure on a storage area network (SAN). There are no internal hard disk drives in the appliance because it's designed to use the disk arrays on the SAN. It operates at the block level, and is able to protect applications that work on file systems, DBMS and even raw disk partitions. The appliance is configured as just another mirror to the volume managers and looks like a set of disks or LUNs.
When a DBMS corruption occurs, the DBA tells the appliance to provide a data image at a point in time before the corruption occurred. This is presented immediately as a set of volumes for the DBMS. It can then be tested to make sure that this view of the data is prior to the corruption. If it isn't, the DBA has the appliance provide the data image at an earlier point in time. This can be done repeatedly until a valid non-corrupted image is found.
This image then restores the data instantaneously to that point in time, stands in temporarily for the DBMS application while quietly resynchronizing the primary volumes with the restored data. Once resynchronized, the primary volumes again are automatically relinked with the DBMS and the continuous replication/snapshot resumes. Recovery is quick and painless.
Revivio Inc.'s Continuous Protection System (CPS) and Alacritus Software's Chronospan offer continuous replication/snapshot. EMC Corp., Hewlett-Packard Co. (HP) and others are working on their own versions of this technology.
One real financial benefit to this technology is that it allows expensive storage to be protected by inexpensive storage from the same or different vendors. It also frees up many of the split mirror volumes and makes them available as primary storage.
Although continuous replication/snapshot reduces DR and business continuity costs and complexity, there's a downside to this technology: limited distance. It's primarily a campus solution and doesn't yet work across long WAN distances. This means out-of-region disasters require other solutions.
Server-to-server storage volume replication: This technology is beginning to emerge as the 21st century's replacement to backup and recovery. It is software installed on each server hosting an application with critical data requiring protection. It's installed on a central or remote server that acts as the "catcher." Think of it as a hub and spoke arrangement where the catcher is the hub.
Storage-to-storage volume replication works by replicating live disk-based data from each server to the catcher across any available TCP/IP network. The replicator duplicates data while preserving the original write order in near real time to assure integrity between the two systems in the event of a disaster. The replicated data, frozen at a specific point in time, can be made available for reading to applications on the catcher system. Should a failure occur on the primary system, the catcher systems can provide immediate access to contemporary business-critical data.
|Continuous replication appliance|
With a continuous replication appliance, write data is time-stamped and the appliance is always ready to instantly restore to any time and any place.
One key aspect to this technology is the ability to control all of the servers from a central console. No IT personnel are required at the server, allowing for "lights out" DR and business continuity at remote sites. Some vendors (such as Constant Data Inc. and Fujitsu Softek) even include continuous replication functionality. Again, allowing continuous replication of only the data that changes reduces the amount of disk and tape storage required.
Server-to-server storage volume replication products include Constant Data Constant Replicator, EMC/Legato Replicator, NSI DoubleTake, Softek Replicator and Veritas Volume Replicator; and still others are emerging. Vendor operating system support varies, but most support Windows and Linux.
Server-to-server storage volume replication is a simple, low cost and effective DR and business continuity solution. It works with direct-attached storage (DAS), network-attached storage (NAS) and SAN storage, regardless of the vendor. It scales from the small to medium business to the enterprise customer. And it allows replication from high-cost storage to low-cost storage.
There are two downsides to this solution: The more servers and applications requiring data protection, the greater the license fees; and the dismal throughput over long-haul TCP/IP networks caused by congestion, bit error rates (BERs), jitter and latency. The first can be handled through vendor negotiation. The second is a more difficult issue and may require the implementation of TCP WAN accelerators.
TCIP/WAN accelerators: Congestion, BERs, jitter and TCP/IP latency all cause throughput degradation in a TCP/IP network. Add longer distance, and you've got degradation that can be so extreme that a typical throughput of a DS3 (rated at 45Mb/s) is approximately 5Mb/s or 625Kb/s. This means that a tiny 30GB replication or backup can't be completed within an eight-hour window--it would take at least 13 hours and 20 minutes to complete. Even if the bandwidth is increased to OC3 (155Mb/s) or more than three times a DS3, the window still can't be met. The throughput doesn't increase anywhere near the same percentage as the bandwidth. Depending once again on the congestion, BER, jitter, latency and distance, it may barely increase at all. The historical throughput increase can be expected to be less than 50%. Even with this increase, the eight-hour window is still missed.
This was first published in April 2004