How does data deduplication affect the backup process in a virtual machine environment?

Data deduplication is an interesting topic, especially in the SMB market. Virtualization makes the deployment of new servers a lot less expensive and any time you cut the cost of something you increase its utilization. So, SMBs tend to see an increase in their number of logical servers.

Oglesby:

Data deduplication is an interesting topic, especially in the SMB market. Virtualization makes the deployment of new servers a lot less expensive and any time you cut the cost of something you increase its utilization. So, SMBs tend to see an increase in their number of logical servers.

If they are increasing the number of logical servers, then they are increasing the amount of data. But, if you look at data on any given server, most are going to be exact carbon copies of each other; the operating system files are going to be the same and most of the supporting applications that are installed, such as antivirus and monitoring agents, are going to be the same. And if we actually get into the imaging of systems, taking full images of virtual machine (VM) systems, there is a lot of white space to take into account.

Data deduplication becomes important because SMBs might start out and want to do their backup as usual, and then once they get a little more comfortable with the imaging technology, take snapshots and not have to perform backups all the time. What they wind up with is not only an increase in the number of logical servers or VMs to backup, but also taking images nearly every night.

The amount of data being stored just for backups is enormous. So data deduplication, especially inline deduplication, will really have the impact of driving down the cost for the end user and allowing them to keep more images and be more flexible in their backups.

Merryman:

Your typical SMB backup infrastructure is usually not designed to be massively scalable because they are not faced with tremendous data growth. So, you typically see one or two master servers and a single or a few tape libraries, VTLs or deduplication if it has already been adopted, but in more cases than not, it's tape infrastructure.

So, while tape is great in terms of scaling, because you can always buy more media and keep plugging it into your library, you still need to have online access to that media for operational restores. You also still need access capacity to do things like offsite tape vaulting or cloning. So, when you look at the traditional client deployment in a virtual server environment and the data bloat that happens, you can quickly run into a wall in terms of capacity that SMBs are not typically prepared for in terms of budget, onsite staff and the ability to adapt quickly.

When you think about deduplication, it is going to allow you to store more logically on less disk within the backup infrastructure. But running up against the wall in a disk space backup infrastructure can actually be worse than hitting the wall in a tape infrastructure. Tape is portable and can be easily bought, while the disk-based infrastructure capacity planning and demand forecasting can be a much bigger issue.

Deduplication also varies in terms of implementation. While client-side deduplication has a lot of advantages, it can also compound the performance impact on the client side, which can be an issue in a large virtual server environment.

Another approach to take is appliance-based deduplication, which is really after the backup process happens and sends data to the backup server environment. That is happening on the back-end infrastructure and it really offloads the deduplication workload to an appliance. We've seen a lot of success in the field with both approaches. The SMB space has probably been where these technologies, both agent- and appliance-based deduplication, have really had the most success to date.

Check out the entire Virtual Server Backup/Storage FAQ.

Dig Deeper on Storage for virtual environments

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

SearchDisasterRecovery

SearchDataBackup

SearchConvergedInfrastructure

Close