This article can also be found in the Premium Editorial Download "Storage magazine: Integrating virtual servers and desktops with storage."
Download it now to read this article plus other related content.
Too much data for the cloud?
The first challenge is that traditional backup sends and stores a lot of data. Traditional backup systems typically perform full backups once a week, and even backup apps that don't perform repeated fulls on file systems (e.g., IBM's Tivoli Storage Manager) perform full backups on applications. (Many companies even perform daily full backups of some key applications.) In addition, all traditional backup applications perform full-file incremental backups. That means if just a single byte has changed in a file, the modification time is changed or the archive bit is set so the entire file is included in that night's backups.
Both of these typical practices create a lot of data that's sent across the network and stored on the target device. If the target device was a cloud backup service, it would require significantly increased bandwidth and higher charges to store the data in the cloud. Remember that traditional backup systems are why data deduplication was developed. The backup applications create 20 GB on "tape" for every 1 GB on primary disk. So a 10 TB data center would need to pay for approximately 200 TB of cloud storage every month.
In addition to the cloud storage vendor's fees for disk capacity used and the amount of data transferred, there are the costs associated with having sufficient bandwidth to get the data to the cloud storage vendor. If you consistently and regularly create a 10 TB full backup and want
Off-site data: The good and the bad
The second challenge ironically involves one of the key advantages of using a cloud backup service: having backup data stored off-site. Assuming you solve the problem of getting the data off-site in the first place, you then have the problem of all of your data being in a different location than your servers. Obviously, this can significantly hamper your ability to meet your recovery time objectives (RTOs). This means that any copy of your data that's stored in the cloud should be just that, a copy. More specifically, it shouldn't be the copy you rely on for routine data recoveries. Using cloud storage as the only copy of large amounts of data that need to be transferred across the Internet is simply a disaster waiting to happen.
This sounds like a problem for data deduplication to solve, right? Sort of. A lot of backup software packages can deduplicate the data before sending it over the Internet. That can certainly address the challenge of getting the backups onto cloud storage, but it doesn't address the challenge of getting the data back. So the rule about not relying on a single copy of your backups stored in the cloud still applies whether you're able to use deduplication or not.
This was first published in October 2010