In the past decade or so, storage management has developed into a discipline in its own right, driven by big increases in the amount of data being stored, as well as the rise of storage networking protocols that facilitate shared storage.
Virtualization, meanwhile, has become the foremost technology for server and PC optimization. In this environment, shared storage provides functionality that would otherwise not be possible, such as nondisruptive virtual machine migration.
But virtualization adds a layer of complexity in the association between a server and the storage that supports it. That layer of abstraction between virtualization and storage means it's a challenge to translate storage-focused concepts such as RAID groups and LUNs to virtual objects such as VMDKs and virtual hard disks. So, to successfully provide storage for virtual environments, storage admins should take a new tack.
Virtualization creates new operational headaches. Because many virtual machines (VMs) can exist on a single storage LUN, the I/O profile of virtual servers and desktops tends to be more random and unpredictable in nature. The functionality of today's hypervisors enables large amounts of I/O to be generated when moving virtual machines around the storage infrastructure through the use of features such as VMware Inc.'s Storage vMotion and Microsoft Corp.'s Hyper-V Live Migration. Virtualization may also impact heavily on storage utilization as virtual machines are copied, cloned or otherwise replicated across the environment.
In considering virtualization and storage, we must examine the operational structures that have been built up in many large organizations. As IT infrastructure has grown, the component technologies have tended to split into silos covering the disciplines of storage, networking, servers and databases. Once, it was possible for storage administrators to go about their business with little regard for the operation of other parts of the infrastructure. But virtualization has changed that world and made it necessary for those isolated silos to integrate like never before.
Choosing a strategy
Efficient storage management in virtual environments entails meeting two basic metrics: capacity and performance. While this could also be said of nonvirtualized environments, performance is the primary consideration in virtual storage designs as it has more of an impact on the operation of a virtual infrastructure. Slow response times from a single LUN are likely to affect only a single host in nonvirtualized environments; however, poor responses from a large LUN supporting many virtual machines can have a much wider impact. This is especially so with a virtual desktop infrastructure (VDI). There are a number of strategies a storage administrator should consider.
Use hardware acceleration and APIs
Many vendors (including the top six storage vendors: Dell Inc., EMC Corp., Hewlett-Packard Co., Hitachi Data Systems, IBM and NetApp Inc.) today support hardware acceleration of virtualization I/O. This is implemented through application program interfaces (APIs) in the hypervisor, such as vStorage APIs for Array Integration (VAAI). VAAI offloads some of the "heavy lifting" from the hypervisor by letting the storage array choose the best way to perform key operations, such as sub-LUN locking, bulk copying and zeroing out ranges of data. Most recently, in vSphere 5, VMware added the thin reclaim feature, which lets the hypervisor release deleted storage from thin-provisioned LUNs without directly writing data to deleted blocks.
Offloading storage management tasks to the array provides numerous benefits. First, it reduces the workload on the hypervisor, lessening the CPU load and traffic across the storage network. Second, it lets the storage array optimize and prioritize I/O-intensive operations that may be best achieved internally within the array. As the leading hypervisor vendor, VMware has developed a number of APIs, including vStorage APIs for Data Protection (VADP) and vStorage APIs for Storage Awareness (VASA). VASA is of increasing importance in the delivery of scalable storage environments, providing configuration information to the hypervisor about storage LUNs, including replication and performance metrics.
Configure for performance
When delivering I/O to virtual environments, performance is everything. Typically, virtual environments create more random workloads, making the work of optimizing I/O workloads much harder for the array. There are techniques that can be employed to ensure performance is delivered optimally, including:
- Wide striping. This involves spreading I/O across as many physical disk spindles as possible. Wide striping can be achieved by using large RAID groups (being mindful of rebuild times for disk failures) or by concatenating RAID groups into storage pools. This technique is applicable to both file- and block-based storage platforms.
- Dynamic tiering. Like any storage environment, virtual servers will have I/O "hotspots," data that generates a large proportion of the I/O workload. Hotspot areas can be difficult to predict, so platforms that offer dynamic tiering provide an automated way to ensure the hottest data stays on the fastest disk. This technique is particularly useful where virtual machines have been cloned from a single master image.
Use thin provisioning
It's very easy for storage in virtual environments to grow out of control, as virtual machines are relatively easy to create. This is especially true in on-demand environments. Thin provisioning ensures that disk space is consumed only by data that's written to the disk by the host, rather than reserving a fixed image for each VM. The feature can be implemented in the hypervisor and is a common option with most storage platforms.
Use vendor plug-ins
Almost all enterprise and midrange storage platforms offer plug-ins for centralized management tools such as VMware vCenter. This provides a "single-pane-of-glass" view of both virtualization and storage systems, in many cases allowing the storage to be configured directly from the vCenter console. In organizations without dedicated storage teams, this can significantly reduce the work of an IT administrator.
Storage built for virtual servers
A number of startup storage vendors have rolled out hardware and software storage solutions specifically designed for virtual server environments. These include Atlantis Computing Inc., SolidFire Inc., Tintri Inc. and Virsto Software. In essence, these products are architected to address the issues described here, including random I/O challenges.
Managing dynamically changing virtual environments to optimize capacity and performance can be a time-consuming process. As virtual environments scale and mature, there's a need to move toward more automation of manual optimization processes. Hypervisor vendors are starting to include capabilities in their products that allow some of these features to be semi-automated, reducing the onus on the administrator to continually tune the storage environment. In vSphere 5, VMware introduced Storage Distributed Resource Scheduler (SDRS), which provides some degree of automation of storage allocations. SDRS provides automated initial placement of VMDKs, automated migrations of virtual machines to meet capacity and performance goals, as well as affinity rules -- ensuring, for example, that high I/O virtual machines are placed on separate hardware.
The move to more automated storage management will be an absolute requirement as virtual infrastructures scale and become more service-orientated in their delivery. Already, storage vendors are coming to the market with new products that provide provisioning APIs to hook directly into virtual server automation.
Don't forget backup
Backup always seems to be treated as a poor relation in storage management; however, it's of vital importance for delivering high-availability storage environments. In virtual infrastructures, traditional backup solutions aren't the most efficient way to back up and restore data, and other techniques can be used to optimize the backup and restore process.
In block storage deployments, traditional backups use the host itself to back up data. This is because the storage array has no awareness of the format of data on a LUN. The host places the file system onto the LUN, so the backup software relies on the host to provide a stream of files for backup.
On all virtual platforms, a VM is stored as a file or series of files, even when using block-based storage arrays. This makes the backup process easier, as backups can be taken simply by taking a copy of the files that make up the virtual machine.
Some hypervisor vendors, such as VMware, offer APIs that allow third-party software to view changed block data within the virtual machine itself, providing a highly efficient way of backing up only those files that have changed since the last backup was taken. All hypervisor vendors also provide the ability to snapshot virtual machines. Although this results in a "crash-consistent" copy in some instances, with agent software the snapshots can be coordinated by quiescing the host file system to allow consistent snapshots to be taken.
Storage tools will evolve
Storage continues to be a key feature in deploying scalable virtual infrastructures. As these environments scale and mature, storage administrators will need to employ tools and techniques such as automation and visualization software that will allow them to meet the challenges of an ever-integrated IT world.
Chris Evans is a UK-based storage consultant. He maintains The Storage Architect blog.