Server virtualization and virtual desktops can make configuring and managing storage systems a lot tougher. These 10 tips can help ease some of the stress caused by managing storage in a virtual environment.
By Eric Siebert
Server and desktop virtualization have provided relatively easy ways to consolidate and conserve, allowing a reduction in physical systems. But these technologies have also introduced problems for data storage managers who need to effectively configure their storage resources to meet the needs of a consolidated infrastructure.
Server virtualization typically concentrates the workloads of many servers onto a few shared storage devices, often creating bottlenecks as many virtual machines (VMs) compete for storage resources. With desktop virtualization this concentration becomes even denser as many more desktops are typically running on a single host. As a result, managing storage in a virtual environment is an ongoing challenge that usually requires the combined efforts of desktop, server, virtualization and storage administrators to ensure that virtualized servers and desktops perform well. Here are 10 tips to help you better manage your storage in virtual environments.
#1 Know your storage workloads. Virtual desktop workloads are very different from virtual server workloads, and the workloads imposed by individual desktops and servers can also vary dramatically. Blindly placing VMs on hosts without regard for their disk I/O usage can create instant resource bottlenecks.
You should have a general idea of how much disk I/O a VM will generate based on the applications and workloads it will host. Therefore, you should try to balance high disk I/O VMs among both physical hosts and data resources. If you have too many VMs with high disk I/O on a single host it can overwhelm the host's storage controller; likewise, having too many high disk I/O VMs accessing a single storage system or LUN may also create a performance bottleneck. Even if you have a good idea of your virtual machine's disk I/O workloads, it's still a good idea to use performance monitoring tools to get detailed statistics such as average and peak usage.
And don't forget that VMs are usually mobile and may not always be on the same host; they may be moved to another physical host using technologies like VMware VMotion. Having a group of busy Exchange servers ending up on the same host could bring the disk subsystem to its knees. If you're using VMware's Distributed Resource Scheduler (DRS) to balance workloads among hosts, keep in mind that it doesn't take VM disk I/O usage into account, it only balances based on CPU and memory usage. To compensate for that, use DRS rules that will always keep specific virtual machines on different hosts.
#2 Avoid intense disk I/O. Certain scenarios with your VMs may create periods of very intense disk I/O, which can create such high resource contention that all of your VMs will slow to a crawl. For virtual desktops this can be caused by time-specific events, like all of your users turning on their desktops at approximately the same time each morning -- often referred to as a boot storm. While that kind of situation may be unavoidable, there are ways to deal with it, such as by adding large cache controllers like NetApp's Performance Acceleration Module (PAM) to your storage device, or by using automated storage tiering technologies that can leverage faster storage devices like solid-state drives during periods of high disk I/O.
Other scenarios -- like virtual machine backup windows and scheduled VM activities such as antivirus scans or patching -- are controllable. Having concurrent backups running on multiple VMs on a host or data store can cause high disk I/O that will impact the performance of other VMs running on the host or data store. Try to schedule your backups evenly so you don't have too many occurring simultaneously on the same host or storage resource. You should also consider backup applications that avoid using host resources by accessing the VM data stores directly to back up VM disks. And some specific virtualization disk-to-disk backup products can shorten backup windows and allow tape backups of the disk repositories to occur afterwards without impacting hosts and virtual machines. For scheduled operations like patching and antivirus scanning, enable randomization or create staggered schedules to spread the operations over a period of time so they don't run simultaneously. You should also be careful when running disk defragmentation operations; defrag generates high disk I/O and can cause thin disks to rapidly increase in size.
#3 Use space efficiently. It's easy to use up disk space with virtual machines, but there are ways to control and limit the amount of space they take up on your storage devices. For virtual desktops or lab-type server environments, using linked clones can save a great deal of disk space. Linked clones are similar to VM snapshots where a virtual machine's virtual disk file is made read-only and a smaller delta disk is created for any disk writes that may occur. Linked clones work by creating a master virtual disk image that's read by many VMs but all writes occur on each virtual machine's own delta disk. For example, if you create 100 VMs with 40 GB virtual disks, they would consume 4 TB of disk space without linked clones. If you used linked clones, however, you would have a single 40 GB virtual disk for all VMs to read from and smaller 1 GB to 2 GB virtual disks for writes -- a huge space savings.
Thin provisioning can also help save space. It can be implemented at the virtualization layer or the storage layer. Almost all VMs are given more disk space than they usually need; thin provisioning allows you to overprovision storage by allowing virtual disk files to only take up the space they're actually using and not the full disk space they were allocated. The use of thin provisioning can greatly reduce the amount of disk space your virtual machines consume and will give you more control over costly storage capacity upgrades.
#4 Avoid unnecessary I/O operations. Why generate excessive disk I/O if you don't have to? You should always try to limit the amount of disk I/O that virtual servers and virtual desktops create. This includes disabling any Windows services that aren't needed, uninstalling unnecessary applications, disabling file indexing, and limiting the amount of logging that both the operating system and applications generate. There are many other smaller things that can be tweaked and they can add up to greatly reduced disk I/O across your VMs. You can use end-point management tools or Active Directory group policy to help manage and control the configurations. You'll not only reduce virtual machine disk I/O, you'll reduce consumption of other host resources. Reducing the amount of unnecessary disk I/O that VMs generate is always a smart move as it allows your storage subsystem to operate at maximum efficiency.
#5 Use the right storage for your workloads. Most hosts have local storage available in addition to being connected to shared storage for virtual machines. The types of storage available to your hosts will often have different performance characteristics, such as an 8 GB Fibre Channel SAN and a 1 GB iSCSI or NFS storage device. Besides different storage protocols, you may have hard drives with different speeds (e.g., 10K rpm, 15K rpm) and interfaces (e.g., SAS, SATA, solid state). With so many different storage options to choose from, it makes sense to fit the VM to the right type of storage. Place less-critical VMs on the slower storage tiers and your more critical VMs with higher I/O requirements on the faster tiers. You can also use an automated storage tiering system like Compellent Technologies Inc.'s Fluid Data architecture or EMC Corp.'s Fast technology that moves data between storage tiers based on demand.
You can go a step further by splitting a VM into multiple disk partitions whose virtual disk files reside on multiple storage tiers according to their performance needs. One common way to do this is to create separate disk partitions for the operating system, Windows pagefile, applications and data. The faster storage tiers can be used for the data's higher I/O requirements, while slower tiers can be used for everything else. Even if you don't do that, you can still specify slower or local storage for the large virtual machine virtual swap file created for each VM and used when a host exhausts its physical memory. This also helps ensure that your VM uses less disk space on the more expensive storage tiers.
#6 Don't forget to monitor. People usually pay attention to storage statistics when problems occur, but data storage requires attention on a continuous basis. If you don't monitor your storage performance on an ongoing basis you might not know of potential problems or bottlenecks, or be able to spot trends or patterns that may allow you to act proactively. It's particularly important when using network-based iSCSI and NFS storage because network health can impact storage performance. Storage performance should therefore be monitored at both the virtualization layer and storage layer, as a problem may be visible from one viewpoint but not the other. Monitoring a virtual environment is not as simple as monitoring a physical environment. Products designed for virtual environments that monitor end-user or application experiences can help pinpoint exactly which resource or component may be causing a bottleneck.
#7 Watch out for storage threats that can grow. Virtual machine snapshots and thin provisioned virtual disks represent a double threat as they have the potential to consume all of the disk space on your VM data stores, which can potentially crash or shut down your running VMs. If you plan to overcommit your storage using thin disks, you need to closely monitor their growth. Don't rely completely on thin disks to address disk space woes; try rightsizing VM disks when you create them and don't give them a lot more disk than they need.
Snapshots are an even bigger threat, as VMs can have multiple snapshots with their combined space much larger than the original virtual disk file size. While VM snapshots can be a handy tool, you should never use them in lieu of traditional backups. Not only do snapshots take up additional space, they can reduce data storage performance. That's especially true when you delete a snapshot and the delta disks are written back to the original disks causing intense disk I/O while the operation is occurring. For very large snapshots, try scheduling their deletion during off hours when the extra I/O will have less of an impact.
Don't rely on manual methods to monitor thin provisioning and snapshots. For thin disks, set alarms for specific overcommitment percentages so you'll know when your data stores are becoming overcommitted. For snapshots, use alarms to look for snapshots that grow beyond a certain size. You should also use alarms to monitor data store free space to alert you when space is low. Be sure to set your alarms high enough, as thin disks and snapshots can sometimes grow very quickly and there might not be much time to respond. Don't rely completely on alarms for snapshots; use regular reporting tools to identify snapshots so they don't run longer than needed.
#8 Integrate server virtualization with storage management. More and more storage vendors are integrating server virtualization and storage so they can be managed and monitored using a single console. Examples include plug-ins developed for VMware vCenter Server from NetApp (Virtual Storage Console) and EMC (Virtual Storage Integrator) that allow storage arrays to be managed from vCenter Server. This type of integration allows for much simpler management of the storage devices used by virtual hosts because monitoring, provisioning, replication and other storage operations can be done directly from vCenter Server.
Storage vendors are also leveraging virtualization APIs to provide very tight integration between the storage layer and the virtualization layer. Using the VMware vStorage APIs, storage tasks traditionally handled by the virtual host (e.g., block zeroing, Storage VMotion) can be offloaded to the storage array, thereby freeing up host server resources. The APIs also provide more intelligent multipathing to achieve better I/O throughput and failover, and offer replication integration for products like VMware's vCenter Site Recovery Manager.
#9 Traditional methods might not cut it. Moving from a physical to a virtual environment also requires a change in thinking. Things like backups, server provisioning, monitoring and management are all very different once servers are virtualized. Applications written specifically to monitor and manage physical environments typically aren't effective in virtual environments because they're not aware of the virtualization layer between the server hardware and the guest operating system.
With backups, for example, it's not efficient to back up servers through the OS layer on virtual hosts. Instead, most virtualization-aware backup apps go directly to the virtualization layer, which is quicker and more efficient. Performance monitoring is another example: If you monitor using OS tools that aren't aware of the virtualization layer, the results will often be inaccurate as the OS tools don't have direct access to the underlying host hardware.
#10 Prioritize storage traffic. Hosts with many virtual machines running on them can be like the Wild West with all of the VMs fighting for the host's limited resources. You can end up with less-critical VMs impacting the resources of critical virtual machines and the resources available for host operations. To prevent this kind of contention, consider using storage I/O controls that can provide a Quality of Service (QoS) level for certain critical host functions and VMs. VMware's vSphere 4.1 introduced a new feature called Storage I/O Control (SIOC) that works by measuring storage latency; when a set congestion threshold is reached for at least four seconds, it enforces configurable I/O shares on VMs to ensure the highest-priority virtual machines get the I/O resources they need. SIOC should help restore some order on busy hosts and allow VMs to coexist peacefully by making it less likely that a few rogue VMs will drag down your critical virtual machines.
BIO: Eric Siebert is an IT industry veteran with more than 25 years of experience who now focuses on server administration and virtualization. He's the author of VMware VI3 Implementation and Administration (Prentice Hall, 2009) and Maximum vSphere (Prentice Hall, 2010).