Deduplication. Once the data arrives in its native format to the device, it must be deduped. Inline boxes dedupe the data the second it arrives. The original data never hits disk; therefore, an inline vendor's dedupe speed is the same as its ingestion speed. Post-process vendors can take from seconds to hours to dedupe data. You'll have to investigate how long the dedupe process actually takes.
Replication. Your dedupe ratio comes into effect with replication as well. The better your dedupe ratio is, the fewer blocks will have to be replicated. But the only way to know for sure how replication will work is to actually do the replication. Observe how many blocks of data are replicated and note when the replication starts and stops. You may be able to capture this data from the dedupe vendor, but to test it yourself you may need to use a network tool to get this information. Remember that not all vendors start replicating at the same time. Of course, nothing can be replicated until it's deduped, but don't assume that an inline vendor will replicate backups immediately after they're deduped; many vendors will wait until a given tape is no longer being used or a file is closed (in the case of NAS).
Test with production data, but…
You must test target dedupe systems by storing your actual production backups on them. However, don't test your dedupe system by backing up production systems directly to it. Vendors would
So how do you test dedupe systems with production data without backing up production systems to them? It's simple. Copy your production backups from tape or disk to the dedupe system. When testing restore/copy speed, copy backups from the deduped device to disk or tape because the "reconstitution" process the dedupe system has to go through for a copy is exactly the same as what it does for a restore.
Determine how long you plan to store your backups in the dedupe system. In my opinion, if you plan to store 90 days of backups in your dedupe system, that's how many days of backups you should store in your test system. (It won't take 90 days to store 90 days' worth of backups.)
If you plan on testing 90 days of backups, pick a period of 90 days that starts with a full backup (or an IBM Corp. Tivoli Storage Manager backup set) and continues for 90 days. If you're testing multiple dedupe systems, make sure to use the same set of backups with each test (ceteris paribus -- with all other factors or things remaining the same). Copy the first full backup (or backup set), followed by backups that are 89 days old, then 88 and so forth. Do that until you've worked your way up to 90 days.
This was first published in May 2009