VM storage efficiency technologies: Dedupe, compression, thin provisioning

Thin provisioning, deduplication and compression are key storage efficiency technologies for IT shops to consider using in VM environments.

Thin provisioning, deduplication and compression are three of the most important storage efficiency technologies for IT shops to consider using in their virtual server environments.

In a virtual machine (VM) setting, thin provisioning allows users to recover a considerable amount of wasted space, and deduplication and compression help to remove duplicate copies of the operating system and potentially save a significant amount of storage space.

In the following podcast with SearchVirtualStorage.com's Carol Sliwa, Chris Evans, a U.K.-based consultant at Brookend Ltd. and co-founder of the Langton Blue consultancy, which focuses on resolving IT-related business issues, offered up tips on using storage efficiency technologies in a virtual environment and advice on using space-saving features from the virtual server vendor versus from the storage vendor.

SearchVirtualStorage.com: Which VM storage efficiency technologies should an IT shop consider using, in order of priority?

Evans: The first is thin provisioning. Thin provisioning only stores on disk the data actually written by the host, so if we provide, for example, a 20 GB LUN and only write 1 GB of data, then we only store 1 GB of data on the actual array itself. So, it's quite an efficient technology.

The second is deduplication. Deduplication is a technology that looks for repeated blocks of common data, and when it finds those common blocks of data, it deletes them and retains pointers to one single copy. Effectively, [when] you deduplicate, you remove the unnecessary copies. Obviously this can have quite a big saving on environments where you get a lot of repeated data.

The third technology is compression. Compression is another space reduction technology, very similar to deduplication, that looks for repeated patterns of data. But, it's looking for things like repeated zeros or ones or other small patterns. What it does is it replaces that series of repeated data with a small stub that allows that data to be recompressed in the future, but it stores it in a much more compact format, and again, it can result in significant space savings.

SearchVirtualStorage.com: Why are those technologies especially important in a virtual machine environment?

Evans: [Let's talk] first [about] thin provisioning. If we look at the way we typically allocate virtual machines, we tend to give each virtual machine a certain size and space. For instance, we might allocate a 20 GB LUN to a host or a 40 GB LUN, and we typically tend to have consistent sizes. Now, as we know, that space is never always used by the host, and we always like to make sure there's more space available than we need. So, of course, within a virtual environment, we would waste space with every single host. Thin provisioning gives us a great benefit because it allows us to recover a lot of that wasted space.

For deduplication, we have the opportunity there to deduplicate a lot of data that is repeated. So, if we've installed a lot of the same operating system in a virtual server environment [or in] virtual desktop environments, deduplication would allow us to remove all of that deduplicate data and save a significant amount of storage, and we're talking here about components of the operating system that are typically read-only files that are available across each of those virtual machines.

Similarly, with compression, we would see a similar benefit where we install an application. For instance, we create a database, and that database has a lot of white space; compression could potentially save a lot of space for us there.

SearchVirtualStorage.com: What are the main problems that an IT shop might encounter when using those [VM storage efficiency] features?

Evans: When we're looking at what issues we might encounter, the sort of things we should think about first are the issues around process. A technology like thin provisioning means we're getting something for nothing, and as we see growth in that environment, we might find that the demand for physical storage increases toward the amount of actual, real storage we've allocated out. And, in that instance, we need to make sure we have a process or processes around adding additional storage to that environment and making sure we don't get into a situation where we run out of storage.

For deduplication and compression, we're clearly reducing the amount of physical disk storage being used, but we're supporting a large amount of potential I/O because multiple hosts could all be accessing that same piece of data. What we have to make sure is that deduplication, for example, doesn't result in a lot of high I/O activity against a very small set of disks, and we have to look at performance and be very careful of that.

SearchVirtualStorage.com: If the virtual machine vendor and the storage vendor offer the same storage efficiency feature, which one should the IT shop use?

Evans: We hear this one come up quite frequently. Let's start with thin provisioning. Now imagine we do thin provisioning in the hypervisor. Clearly there'll be an overhead in that it might consume more cycles to manage and so on. So, that's one thing we have to think about there. If it's done within the array, clearly that array could be specifically built to work with thin provisioning, and it might be more efficient to do it there. What I would suggest is that potentially people should look to do it in both places because effectively you get the benefits in both areas without any real impact.

When we look at things like deduplication, deduplication and compression potentially have more I/O overhead and therefore could result in more impact in the hypervisor, and perhaps having it in the array is a more suitable place for those.

SearchVirtualStorage.com: What's the most important piece of advice you would offer to IT shops using storage efficiency technologies?

Evans: Touching on something we mentioned earlier, I think with all of these technologies, probably the most important thing to do is to make sure that you understand how they work and you've got processes around them. I talked about thin provisioning having an issue if you oversubscribe, and clearly, you need to be able to work out how you're going to manage that, and that requires process that sits around it. If you intend to use deduplication, exactly what will I be deduplicating? Can I show the savings? Can I show how that savings changes over time and whether it becomes variable as I add in more guests and so on? You need to understand how the rate of change of deduplication works, and similarly, with compression, we'd want to be able to monitor this and make sure we could understand what benefits we're getting as well as how the data changes over time. So, probably the most important thing to do is make sure you've got metrics around your environment and you've got monitoring so you can understand how efficiently these features are being used, and if they do … cause an impact to you, how you could mitigate against that. 

Dig Deeper on Virtualization Strategy