Integrate the cloud with your backup app

Getting the sophistication of a backup app and the simplicity and scalability of cloud storage is possible today. But there are still some kinks to work out.

This Content Component encountered an error
This article can also be found in the Premium Editorial Download: Storage magazine: Integrating virtual servers and desktops with storage:

Getting the sophistication of a backup app and the simplicity and scalability of cloud storage is possible today. But there are still some kinks to work out.

Are you still backing up to boring old tape, disk or even deduped disk? What about using a backup device that has unlimited capacity and doesn't need to be managed? That's what cloud backup promises -- but can it possibly be that easy?

There are a lot of products and services being bandied about with the term cloud attached to them. Everybody wants to be on the cloud bandwagon, and it seems like everybody has an opinion about what the cloud is or isn't. In the absence of any clearly defined standards, we'll provide a definition for the purposes of our discussion.

The cloud, specifically the public cloud, is storage (or backup) that you don't have to manage. It has unlimited on-demand capacity and infinite burst capabilities, all for a price at or below what it would cost to do it yourself. If you want to pay for 100 TB this month and then pay for only 5 TB for the next year, you can do that with the cloud. You can't approach that kind of flexibility if you buy a storage system because you've paid for it whether you use it or not.

There's also the private cloud, where most of the marketing hyperbole is happening. For our purposes, we'll focus on the public cloud.

Cloud storage vs. cloud backup

Although the terms are often used interchangeably, there's a difference between cloud storage and cloud backup. Cloud storage is storage as a service. To tap into cloud storage, you get an account with a service provider; they provide you with their API and you use some type of software that enables you to store data via that API. Voila! You have storage with unlimited capacity. You don't manage the storage where your data resides, and you don't even have to ask for additional capacity. All you have to do is pay the bill. All cloud storage providers charge a "storage fee," a monthly rate based on how many gigabytes of data are stored in your account. In addition, some cloud storage providers may charge a fee for each gigabyte that's downloaded or uploaded -- essentially a "bandwidth fee." With cloud storage, you still have to manage the application that's sending the data into the cloud.

To be considered a cloud backup service, a cloud service must provide all of the above plus the software to make the backups happen. A cloud backup service typically provides some type of client software that must be installed on all the systems to be backed up. Backups are then automatically scheduled to occur on a regular basis. The backup software generally uses techniques such as delta-level backups or full deduplication to minimize network traffic.

The provider's service-level agreement (and the price they charge) will determine what happens when things don't go as planned. At a minimum, the service may provide an on-screen pop-up notification or an email message to tell you that things are going well (or not). The service may also have the ability to automatically escalate the problem when failed backups aren't addressed.

Click here to get a PDF of the Backup Apps Plus the Cloud product sampler.

Traditional backup software meets the cloud

Some companies may use a cloud backup service for all of their backups, while others may opt for a combination of traditional backup methods and cloud services. There are two very different ways to go about integrating traditional backup software and the cloud. You can use a traditional backup system in parallel with a cloud backup system, or you can use backup software that has the ability to use a cloud storage system as its target.

If the main reason you're considering using cloud backup is the "hands-off" aspect, then this is the route to take. You can continue using traditional backup software to perform the bulk of your backups, then use cloud backup software to handle those parts where it would be most beneficial. The most common practice is to start by performing remote site and laptop backups using the cloud backup service. Many companies aren't yet performing backups of their laptops, and backing them up with traditional backup software is problematic, to say the least. Most companies back up their remote sites, but they often use less than desirable methods because their remote offices don't have dedicated IT staff. A cloud backup service can solve both the laptop and remote-office problems; all you have to do is write a check.

Using cloud storage as a target for a traditional backup software package is a bit more problematic, but it's not without its advantages. The same things that are true of cloud storage for traditional data are true of cloud storage for backups: no management, endless capacity, etc. As a "bonus" you automatically get off-site backups, which is still a hassle for many companies.

There may be more challenges than advantages, however, when it comes to using cloud storage as the destination for traditional backups.

Too much data for the cloud?

The first challenge is that traditional backup sends and stores a lot of data. Traditional backup systems typically perform full backups once a week, and even backup apps that don't perform repeated fulls on file systems (e.g., IBM's Tivoli Storage Manager) perform full backups on applications. (Many companies even perform daily full backups of some key applications.) In addition, all traditional backup applications perform full-file incremental backups. That means if just a single byte has changed in a file, the modification time is changed or the archive bit is set so the entire file is included in that night's backups.

Both of these typical practices create a lot of data that's sent across the network and stored on the target device. If the target device was a cloud backup service, it would require significantly increased bandwidth and higher charges to store the data in the cloud. Remember that traditional backup systems are why data deduplication was developed. The backup applications create 20 GB on "tape" for every 1 GB on primary disk. So a 10 TB data center would need to pay for approximately 200 TB of cloud storage every month.

In addition to the cloud storage vendor's fees for disk capacity used and the amount of data transferred, there are the costs associated with having sufficient bandwidth to get the data to the cloud storage vendor. If you consistently and regularly create a 10 TB full backup and want to send it to the service over the wire, using a cloud storage vendor isn't likely to be practical. But even if your backup needs aren't that extreme, the behavior of traditional backup will make the cloud part of your backup system cost quite a bit.

Off-site data: The good and the bad

The second challenge ironically involves one of the key advantages of using a cloud backup service: having backup data stored off-site. Assuming you solve the problem of getting the data off-site in the first place, you then have the problem of all of your data being in a different location than your servers. Obviously, this can significantly hamper your ability to meet your recovery time objectives (RTOs). This means that any copy of your data that's stored in the cloud should be just that, a copy. More specifically, it shouldn't be the copy you rely on for routine data recoveries. Using cloud storage as the only copy of large amounts of data that need to be transferred across the Internet is simply a disaster waiting to happen.

This sounds like a problem for data deduplication to solve, right? Sort of. A lot of backup software packages can deduplicate the data before sending it over the Internet. That can certainly address the challenge of getting the backups onto cloud storage, but it doesn't address the challenge of getting the data back. So the rule about not relying on a single copy of your backups stored in the cloud still applies whether you're able to use deduplication or not.

Backup apps can link to the cloud

There are now a number of companies with software and hardware products that support backing up to the cloud. The first backup application vendor to announce support was Zmanda Inc., a commercial firm that offers its version of Amanda, an open source backup program. Amanda Enterprise 3.1 is capable of backing up directly to Amazon's Simple Storage Service (S3) cloud storage service.

CommVault Systems Inc.'s Simpana supports backing up to any cloud vendor that supports the Representational State Transfer (REST) protocol. So you can use cloud storage services such as Amazon, Iron Mountain, Microsoft Azure, Nirvanix or Rackspace as a target for CommVault Simpana backups or archives. Archiving may actually be a more appropriate data protection application for cloud storage because archivers don't perform repeated fulls and they have object-level dedupe built in.

EMC Corp. and Symantec Corp. did something similar when they each added the capability to back up to their own networks. EMC NetWorker backs up to any cloud vendor using EMC Atmos-based storage, while Symantec Backup Exec backs up to the Symantec Protection Network.

If your company uses a backup application that doesn't yet support backing up to the cloud, you might want to consider Nasuni Corp.'s Filer, which provides an NFS/CIFS NAS gateway to cloud storage. Any decent backup software package can back up to an NFS or CIFS mount.

Although it has limitations, one must still consider the availability of data deduplication when exploring backup applications that integrate with the cloud. Neither EMC NetWorker nor Amanda has any deduplication built into its products. CommVault Simpana and Symantec Backup Exec can deduplicate data before it's sent to the backup target. Simpana offers target deduplication that does the deduping once the data is sent to the media agent, while Backup Exec uses source deduplication with deduplication occurring at the client before the data is sent across the network. This makes them much more attractive companions to cloud storage. IBM Tivoli Storage Manager (TSM) customers also have an interesting option with Nasuni because TSM has deduplication built in.

Try it, but test it

Cloud backup services can be great complements to traditional backup systems, especially when those systems provide some level of integration. Because a cloud backup service will require little if any hardware to be installed on your site, it's relatively easy to perform a full proof of concept using real data. This is especially important because implementation may require substantial investments for licenses and have a profound effect on your backup environment. As with any backup product or service, you should test everything and believe nothing.

BIO: W. Curtis Preston is an executive editor in TechTarget's Storage Media Group and an independent backup expert. Curtis has worked extensively with data deduplication and other data-reduction systems.

This was first published in October 2010

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close