This article can also be found in the Premium Editorial Download "Storage magazine: Should you consolidate your direct-attached storage (DAS)?."
Download it now to read this article plus other related content.
Different types of snapshots
As data volumes steadily increase, the need to back up that data--often within shrinking timeframes mandated by a growing demand for 24 x 7 operation--has made snapshotting and replication critical tools for storage managers. Both have been available for some time as features of proprietary storage management applications, but their use has become easier thanks to their integration deeper into enterprise storage systems.
Depending on the technology in use and the dictates of the storage environment, snapshotting can take several forms. Volume-based snapshots work by maintaining a mirrored secondary volume. When the snapshot is executed, it's logically split from the primary volume and this volume can then be backed up. Because data remains internal to the storage unit, the procedure is completed within seconds or minutes, compared with minutes or hours using conventional tape-based backups.
|Providing more flexible snapshots|
Snapshots are a quick way of putting some distance between the real-time data processing environment and the army of tape drives or near-line disks being rolled out to manage backups. If a snapshot is copied to faster near-line disk, it can also be used as a data source by other applications for tasks such as data mining and reporting. This approach is valuable because it provides a stable data set for analysis while the primary database continues to process transactions.
Snapshots can also be file-based, a more granular approach in which changes to specific directories or files are recorded, then duplicated onto remote copies once the snapshot is executed. This is a faster method of replicating just the data that has changed, and keeps snapshot times down to a minimum.
In a point-in-time snapshot, systems maintain an ongoing log of changes to the volume. This log--rather than the full data set itself--is duplicated at snapshot time. This approach provides a full record of data changes and the ability to roll back the data environment to any point in the past.
Snapshotting is a simple, elegant solution that's saving USC and many other organizations precious hours nightly. Its considerable value to customers has made it a standard feature in broader storage volume management suites from EverStor, FalconStor, Legato, StoreAge, Veritas, and many other independent software vendors (ISVs).
But they're not alone: Leading storage hardware vendors--including EMC, Hewlett-Packard, Hitachi Data Systems, IBM, Network Appliance, and Sun Microsystems--have all embedded proprietary snapshot capabilities into their respective storage boxes. This has allowed for highly optimized snapshotting, but has also restricted snapshots to use on the same host platform because there's currently no standard for snapshot structure.
Some ISVs have gone part of the way towards resolving this issue by providing support for snapshots from specific vendors. But Microsoft is aiming to eliminate this problem altogether with the creation of Volume Shadow Copy Service (VSS), which will debut in Windows Server 2003 and may become a de facto standard for snapshot structure.
Snapshotting isn't the only option for time-starved storage managers. Increasingly, it's being complemented by live replication, which synchronizes data between two or more computers in real time. Replication has long been possible through use of dedicated mirroring interface cards, within RAID arrays or as a software component on many key enterprise systems.
Both embedded and software-based approaches have their respective benefits--vendor independence is a key benefit of software solutions. For application hosting provider BlueStar Solutions, of Cupertino, CA, software replication--in the form of Veritas Volume Manager--has helped customers replicate their data between BlueStar data centers in Dallas and Phoenix.
The company's customers--which include Autodesk, eBay, and Solvay Pharmaceuticals--use a wide variety of storage devices including EMC Clariion, Hitachi 9500, IBM Fast, and other boxes totaling more than 150TB of storage. Given the broad range of devices it had to replicate between, software-based replication was the natural choice for BlueStar.
"We have customers with a RTO [Recovery Time Objective] of four hours or less, and the only way for us to do that is to give them an environment where there's replication between our data centers," says Bill Augustadt, BlueStar's CTO.
However it's implemented, replication occurs in either a synchronous or asynchronous manner. In a synchronous setup, data is replicated between primary and secondary storage arrays continuously and in real time over the fastest path available--Fibre Channel in a SAN, any IP network when data is being moved between sites or the device's backplane when replicating between volumes.
However, synchronous replication is inherently unsuited to covering longer distances--many set 10 km as the practical limit--because latency and interference on intervening cables can disrupt the smooth flow of data. Furthermore, closed-loop replication only tends to work with a second, similarly configured box from the same vendor. This may be fine for many companies, but can present problems in situations where mergers and acquisitions have introduced a heterogeneous storage environment.
"People are really looking at disaster recovery plans and realizing that their traditional tape back or restore strategies may not meet RTOs or backup windows," says Matt Fairbanks, senior manager of technical marketing with Veritas, which incorporates replication as a pay-to-use option within its Volume Manager software.
This was first published in June 2003