This article can also be found in the Premium Editorial Download "Storage magazine: Should you consolidate your direct-attached storage (DAS)?."
Download it now to read this article plus other related content.
|When two remote sites make sense|
Distant remote sites protect against major catastrophes, but are awkward for more frequent local disasters, such as a building fire or flooding. A nearby mirrored site is better for that, and it can be copied asynchronously to a distant site for full protection.
Store and forward
Disk mirroring was designed to solve failures in local disk drives; store-and-forward storage was designed specifically for business continuity applications. The basic operation of store-and-forward storage is for a disk subsystem to receive a write command, write the data to disk and then retransmit the data to a remote disk target.
Most of the store-and-forward products are proprietary and sold by the major subsystem vendors. EMC has Symmetrix Remote Data Facility (SRDF), Hitachi Data Systems has TrueCopy, and IBM sells Peer-to-Peer Remote Copy. The market has placed a high value on these solutions--and with good reason, because they have done what was expected of them under the most adverse conditions. The amazing business continuity successes in the aftermath of Sept. 11 are a tremendous testimony to how well these products perform.
With store-and-forward solutions, the host system only generates a single write I/O and the subsystem does the rest of the work. For example, assume a specific LUN address on a given target address is the data access point for a mission critical line of business application. The subsystem commits write I/Os to its internal disk target and then creates an entry in a FIFO queue in memory or on reserved disk storage. The data is then forwarded from the local subsystem to the remote one in the order it was received. The sending subsystem manages the transmission details including acknowledgements and any error recovery. This error recovery involves maintaining detailed knowledge of its forwarding operations over an extended period of time during periods of communication failures.
With store-and-forward storage, there's usually another non-storage area network (SAN), MAN or WAN network. Gateway systems at both the local and remote sites connect the SAN storage equipment over the non-SAN network. The gateway serves two key purposes: It provides address transparency and performs detailed communication operations on the non-SAN network. Both storage subsystems use their native addressing and methods modes without needing to know anything about the foreign network's addressing and methods. Most store-and-forward implementations have used proprietary gateway technology, although it's expected that FC/IP will become a standard with widespread deployments for business continuity in the near future.
The interrelated issues of distance, bandwidth and cost apply equally to store-and-forward implementations as they do for disk mirroring. However, with store and forward, there's no problem with read I/Os taking the lion's share of the available MAN or WAN bandwidth. This means a store and forward solution can be accomplished with a small percentage of the bandwidth needed for disk mirroring. If the read to write ratio is 3:1, then store and forward only needs 25% of the bandwidth required for disk mirroring.
|DWDM can connect distant Fibre Channel nets|
Dense wave-division multiplexing provides the same kind of distance capabilities as dark fiber. But it also provides a variety of public net services on top of that.
In general, storage I/O latency needs to be sufficiently low to maintain acceptable system performance levels. After a system issues an I/O command, it waits for an acknowledgement to be returned from storage before issuing the next I/O command. If there are delays in the transmission of the I/O command or its acknowledgement from storage, the performance of the system can suffer.
This obviously is an important consideration for business continuity solutions that have to balance cost, performance and data protection. Store-and-forward solutions deal with this by managing the acknowledgements of storage I/Os as either synchronous or asynchronous.
Synchronous acknowledgements are issued from remote storage after the copy of the original write data has been received and written to disk. The main benefit of synchronous acknowledgements is knowing precisely what data has been received by remote storage. As each and every write I/O is acknowledged, there's no ambiguity as to the state of data on remote storage. The primary disadvantage of synchronous acknowledgements is that they can be relatively slow and introduce significant latency into I/O processes.
Asynchronous acknowledgements are issued from local storage after the I/O has been committed to local storage. Asynchronous has the opposite set of characteristics from synchronous acknowledgements. The main benefit of asynchronous acknowledgements is the lowest possible latency. The primary disadvantage of asynchronous acknowledgements is not knowing what data made it safely to the remote storage subsystem. This means that there's likely to be some amount of repair work to do with the data on remote storage, if it ever needs to be called in to use after a disaster strikes the local site.
Creating multilevel solutions
To design a business continuity strategy, it may make sense to establish two different disaster radii. The first would be to recover from disasters that may take out a local site, but without impacting much of the surrounding area. The second would be a much larger radius that would place remote storage a large distance from local storage.
This approach of using two remote storage sites has been adopted by the financial services industry to meet their needs for responding to different types of disasters as quickly as possible. To differentiate between the remote storage sites, we'll refer to them as nearby remote storage and distant remote storage. The rational for having nearby remote storage is to be able to react to the disaster with the team that lives and works in the area. The logistics of transporting key skilled IT workers hundreds of miles following a major disaster has its own set of problems that can be avoided with nearby remote storage.
Let's assume a business wants to create a two-level business continuity solution. The company has two buildings that are located approximately five miles apart and are connected by dark fiber. They plan to use disk mirroring between systems in both buildings to keep a copy of the data on nearby remote storage in the other building. In one of their buildings, they employ store and forward technology over an existing T-3 WAN to send data to a remote storage subsystem. Synchronous acknowledgements are used between nearby storage and distant storage subsystems to ensure complete data transmissions to the distant remote storage subsystem. Asynchronous acknowledgements would also work in this case, depending on whether or not the bandwidth savings would justify the change (see "When two remote sites make sense").
This was first published in June 2003