At a recent storage event I heard a little about snapshot, clone and snapclone. Can you please explain the difference and in which environments such implementations are advisable?
Sure. Each vendor calls their snap technology by a "pet" name. I wish the acronyms were standardized within the storage industry, since it would go a long way to reduce confusion.
There are basically two methods to create a snapshot of data. One is an exact duplicate of all the data, and the other is a "picture" of how the data looked when you took the snapshot. The exact duplicate is a physical copy of all the data on a particular LUN or file system, which can be called a mirror, clone, Image, etc.. The process can be done either by a host (MirrorSet on Windows, Mirror volume under Veritas, etc.) or can be done in hardware at the storage level (Clone, BCV, ShadowImage, etc.).
The other method is the "picture" of what the data looked like. This is usually referred to as a "metadata" copy, meaning all the data is not actually copied to another location, just the pointers to where the data actually resides are copied. Metadata snapshots use technology called "copy on write", where when there is a snapshot in place, if someone tries to write over the data on the orginal LUN, the snapshot software will first copy the original block of data to a new location (a pool of storage dedicated for copy out operations) before it lets the write happen. The software then maps its pointers to the new location when you reference the original data, or back to the old location when you reference the snapshot. In the figure below, when the snapshot is first created, the snapshot metadate points to the same location for reads on both drives E and G. A write operation to drive E would cause the metadata to point to the new location where the data was moved to, while drive G will still point to the original data (block 7).
The terms you are using sound like HP terminology for the StorageWorks arrays. In the StorageWorks array, the term "snapshot" refers to the "metadata" type of copy. The term "clone" refers to the physical type of copy (usually a three drive mirrorset, where you can break off one of the mirrors, and assign it to another server for things like backup). A "snapclone" just combines the two technologies, and lets you instantly create a duplicate copy. The actual data movement happens under the covers, and you can access both the original copy and the snapshot while the hardware moves the data between drives. If you have a three drive mirror set up in advance, it is almost an instant process to create the clone, since all you have to do is break off the third mirror. If there was no third mirror to begin with, then it would take time to add the drive to the mirrorset, and wait while the drives catch up and become duplicates of each other.
As for where to use what technology. If you take a metadata snapshot, and lose the original volume due to hardware failure or corruption, then your data is gone. A physical copy is better in situations where you need the data to be around for a while, or it is critical. Also, most metadata based snapshots incur a performance penalty on the original volume as the data is being copied out. If you are running a production database, and are doing a lot of writes, you may see up to a 30% performance hit while there is a snapshot present for the LUNS making up the database. A physical clone copy is better in these instances. The good news is that snapshots become faster as more data is "copied out", since it slowly becomes a complete physical copy over time.
To get more information on snapshots and how they are used, check out our SAN School series of webcasts and quizzes. Specifically, sign up to hear lesson 11, which covers snapshot technology much more in depth.
Editor's note: Do you agree with this expert's response? If you have more to share This was first published in November 2003
This was first published in November 2003