This article can also be found in the Premium Editorial Download "Storage magazine: Evaluating the benefits of IP SANs."
Download it now to read this article plus other related content.
My first article in this two-part series discussed the basic concepts for data replication in the event of a disaster. (See "Cost-effective business continuance") The differences between host-based disk mirroring and subsystem-based store and
|Key remote data
Disasters are unexpected events that wreak havoc. IT organizations are increasingly being asked to assume the responsibility of finding technological methods to overcome the physical devastation that disasters cause. The whole point of using remote copy data replication technology is to provide a computing environment in which the impact of a disaster is as transparent and minimal as possible. In other words, remote data replication is expected to provide an organization with immediate access to data without first having to fix or repair the data. This means the remote site must have complete data integrity. If administrators have to spend hours troubleshooting data integrity problems at the remote recovery site, the value of the remote data replication product is significantly reduced.
One of the challenges in preparing for disasters is accommodating both instantaneous events and events that may take minutes--or even hours--to do their damage. In the case of instantaneous loss, the data that has been replicated to a remote site must be complete as is. Where databases are concerned, an incomplete transaction might have been replicated to the remote site, which would require it to be rolled back. This is fine, and should be fairly easy for a skilled DBA to recognize, as long as all of the transaction information has been accurately replicated.
A rolling disaster
A rolling disaster is a difficult scenario that takes place over an extended period of time. In this type of situation, systems continue to run after the onset of a disaster before eventually failing and, in this scenario, it's possible that the processing that occurs while the disaster is taking place may result in data errors that are replicated to the remote site. Some remote data replication products have been designed to address this type of problem, allowing an administrator to select the historical point in time when they want to stop applying replicated data. The selection of the cut-off time might involve more art than science, but it's easy to see why IT executives would want the additional safeguards in the form of products that offer that feature.
Remote data replication products all have a similar generic structure. A selection and queuing process is used to collect and stage the I/Os for transmission over a MAN or WAN; a transmission system is used to copy the I/Os to the remote site; and a remote writer is used to apply the I/Os at the remote site.
Queuing write I/Os
Remote data replication systems must collect write I/Os as they occur and prepare them for transmission over a network to a remote site. This process begins by analyzing every I/O instruction and selecting the writes. Each write instruction is reflected by an entry in a temporary data structure, such as a queue of actual write instructions or a list of pointers to pending write I/Os.
One brute force method of ensuring data integrity and write ordering at the remote site is to forward each and every write I/O that occurs on the local site. While this technique might not be the most efficient in terms of network bandwidth, it does have the advantage of being fairly simple to understand.
An alternative technique--one that saves network bandwidth costs with remote data replication--involves weeding out write I/Os from hot spots that are quickly overwritten with repetitive updates. For instance, a data hot spot may be written to several times in close succession. There's no need to replicate every I/O, as long as the final result is accurately recorded at the remote site.
Consider a scenario in which four storage blocks, A, B, C and D, are being updated by an application. Block A represents a hot spot that's updated whenever blocks B, C and D are updated. In a short period of time, block A could be updated three times while blocks B, C and D are each updated once. It's certainly possible to save transmission bandwidth by not sending the first two updates to Block A, and instead only send the third update together with the updates to blocks B, C and D. The trick to doing this is making sure that all four blocks are transmitted as a unit by the data replication product and written to remote storage as a granular unit. In other words, if all four blocks can't be written for some reason, then none of them should be until the problem is corrected. Notice that write ordering isn't violated this way, as long as all the blocks are applied successfully.
The problem with using this approach is that some write I/Os are temporarily delayed before being sent to the remote site. This means there is some potential to lose delayed I/Os if a disaster occurs. While this could be a serious problem for some applications, it's probably not for most.
One of the challenges with remote data replication is dealing with MAN or WAN problems--such as outages that prevent write I/Os from being transmitted to the remote site. In these situations, it's normal for the local process to write to an extended queue that "stocks up" write I/Os until the remote communications link can be restored. It's not necessarily rocket science to see how this could work, but obviously, if the remote communications link is down long enough, there's some likelihood that the data at the remote replication system could be pushed past the breaking point, and other techniques may be needed to synchronize data between the two sites.
Transmitting write I/Os
While most people associate remote replication with the end nodes involved in the process, the transmission technology plays a key role. For years, Minneapolis-based Computer Network Technology has been a leading vendor of networking equipment that connects local with remote storage subsystems running remote data replication applications.
Essentially, the transmission equipment is a network gateway that tunnels write I/Os from the local site to the remote site. Where storage area networks (SANs) are concerned, write I/Os are sent from the local data replication queue to the remote data replication gateway. The remote data replication gateway then encapsulates and segments the write I/Os for transmission over the MAN or WAN. On the receiving side, the gateway reassembles the write I/Os and sends them to the corresponding remote writer.
Just as the local queuing process may need additional storage space to hold pending writes when the remote communications link goes down, the transmission gateway may also provide its own write-pending queue to help manage temporary network outages.
As networking technologies continue to evolve, it's likely that more off-the-shelf networking products will be used for transmitting write I/Os. For instance, dense wavelength division multiplexing (DWDM) optical networking equipment could be used effectively to achieve remote data replication in MANs. However, given the distances for most MANs, it's possible that an organization would opt to use host-based disk mirroring instead of the expensive and complex store-and-forward data replication technologies.
In general, the easiest part of remote data replication is applying the writes on the receiving end. Once write I/Os are received, they need to be verified for accuracy and then written to the remote storage target.
However, I've already discussed two scenarios where more intelligence is required on the receiving end. The first involves the ability to select a safe cut-off time for applying writes after a rolling disaster. The second involves working with coalesced or collapsed I/Os that must be applied together to maintain proper write ordering.
An interesting nuance regarding writers involves acknowledging write I/Os that are committed to cache memory, as opposed to being written to non-volatile storage media. This isn't a problem as long as the cache is flushed to disk on a first-in, first-out basis.
This was first published in July 2003