It can still be a struggle at times, but managing storage in virtual server environments is better understood today, with tighter integration and more effective management tools.
Storage management has developed into a discipline in its own right, driven by the growth of data and the emergence of standards such as Fibre Channel (FC), iSCSI and NFS, which have enabled the centralization and standardization of storage systems.
As virtualization has become the main technology for server and desktop optimization, storage has been a key component in delivering highly scalable virtualized solutions. Without centralized storage, certain features such as nondisruptive virtual machine (VM) migration wouldn’t have been possible.
However, while storage has provided significant benefits, it also poses new challenges for both storage and virtualization administrators. Virtualization adds another layer of complexity in understanding the relationship between a server and the storage it uses. That layer of abstraction makes it difficult to translate storage-centric concepts such as logical unit numbers (LUNs), RAID groups and disks into virtual objects such as virtual hard disks (VHDs) and virtual machine disks (VMDKs). Storage administrators need to take a new approach when delivering storage to virtual environments.
Virtualization creates new operational headaches. Because many VMs can exist on a single storage LUN, the I/O profile of virtual servers and desktops tends to be more random and unpredictable in nature. The functionality of today’s hypervisors enables large amounts of I/O to be generated when moving virtual machines around the storage infrastructure through the use of features such as VMware Inc.’s Storage vMotion and Microsoft Corp.’s Hyper-V Live Migration. Virtualization may also impact heavily on storage utilization, as virtual machines are copied, cloned or otherwise replicated across the environment.
We must consider the operational structures that have been built up in many large organizations. As IT infrastructure has grown, the component technologies have tended to split into silos covering the disciplines of storage, networking, servers and databases. Once, it was possible for storage administrators to go about their business with little regard for the operation of other parts of the infrastructure. But virtualization has changed that world and made it necessary for those isolated silos to integrate like never before.
Choosing a strategy
Efficient storage management in virtual environments entails meeting two basic metrics: capacity and performance. While this could also be said of nonvirtualized environments, performance is the primary consideration in virtual storage designs as it has more of an impact on the operation of a virtual infrastructure. Slow response times from a single LUN are likely to affect only a single host in nonvirtualized environments; however, poor responses from a large LUN supporting many virtual machines can have a much wider impact. This is especially so with a virtual desktop infrastructure (VDI). There are a number of strategies a storage administrator should consider.
Use hardware acceleration and APIs
Many vendors (including the top six storage vendors: Dell Inc., EMC Corp., Hewlett-Packard [HP] Co., Hitachi Data Systems, IBM and NetApp Inc.) today support hardware acceleration of virtualization I/O. This is implemented through API interfaces in the hypervisor, such as vStorage APIs for Array Integration (VAAI). VAAI offloads some of the “heavy lifting” from the hypervisor by letting the storage array choose the best way to perform key operations, such as sub-LUN locking, bulk copying and zeroing out ranges of data. Most recently, in vSphere 5, VMware added the thin reclaim feature, which lets the hypervisor release deleted storage from thin provisioned LUNs without directly writing data to deleted blocks.
Offloading storage management tasks to the array provides numerous benefits. First, it reduces the workload on the hypervisor, lessening the CPU load and traffic across the storage network. Second, it lets the storage array optimize and prioritize I/O-intensive operations that may be best achieved internally within the array. As the leading hypervisor vendor, VMware has developed a number of APIs, including vStorage APIs for Data Protection (VADP) and vStorage APIs for Storage Awareness (VASA). VASA is of increasing importance in the delivery of scalable storage environments, providing configuration information to the hypervisor about storage LUNs, including replication and performance metrics.
Configure for performance
When delivering I/O to virtual environments, performance is everything. Typically, virtual environments create more random workloads, making the work of optimizing I/O workloads much harder for the array. There are techniques that can be employed to ensure performance is delivered optimally, including:
- Wide striping. This involves spreading I/O across as many physical disk spindles as possible. Wide striping can be achieved by using large RAID groups (being mindful of rebuild times for disk failures) or by concatenating RAID groups into storage pools. This technique is applicable to both file- and block-based storage platforms.
- Dynamic tiering. Like any storage environment, virtual servers will have I/O “hotspots,” data that generates a large proportion of the I/O workload. Hotspot areas can be difficult to predict, so platforms that offer dynamic tiering provide an automated way to ensure the hottest data stays on the fastest disk. This technique is particularly useful where virtual machines have been cloned from a single master image.
Use thin provisioning
It’s very easy for storage in virtual environments to grow out of control, as virtual machines are relatively easy to create. This is especially true in on-demand environments. Thin provisioning ensures that disk space is consumed only by data that’s written to the disk by the host, rather than reserving a fixed image for each VM. The feature can be implemented in the hypervisor and is a common option with most storage platforms.
Use vendor plug-ins
Almost all enterprise and midrange storage platforms offer plug-ins for centralized management tools like VMware vCenter. This provides a “single-pane-of-glass” view of both virtual servers and storage, in many cases allowing the storage to be configured directly from the vCenter console. In organizations without dedicated storage teams, this can significantly reduce the work of an IT administrator.
Storage built for virtual servers
A number of startup storage vendors have rolled out hardware and software storage solutions specifically designed for virtual server environments. These include Atlantis Computing Inc., SolidFire, Tintri Inc. and Virsto Software. In essence these products are architected to address the issues described here, including random I/O challenges.
Managing dynamically changing virtual environments to optimize capacity and performance can be a time-consuming process. As virtual environments scale and mature, there’s a need to move toward more automation of manual optimization processes. Hypervisor vendors are starting to include capabilities in their products that allow some of these features to be semi-automated, reducing the onus on the administrator to continually tune the storage environment. In vSphere 5, VMware introduced Storage Distributed Resource Scheduler (SDRS), which provides some degree of automation of storage allocations. SDRS provides automated initial placement of VMDKs, automated migrations of virtual machines to meet capacity and performance goals, as well as affinity rules, ensuring, for example, that high I/O virtual machines are placed on separate hardware.
The move to more automated storage management will be an absolute requirement as virtual infrastructures scale and become more service orientated in their delivery. Already, storage vendors are coming to the market with new products that provide provisioning APIs to hook directly into virtual server automation.
Don’t forget backup
Backup always seems to be treated as a poor relation in storage management; however, it’s of vital importance for delivering high-availability storage environments. In virtual infrastructures, traditional backup solutions aren’t the most efficient way to back up and restore data, and other techniques can be used to optimize the backup and restore process.
In block storage deployments, traditional backups use the host itself to back up data. This is because the storage array has no awareness of the format of data on a LUN. The host places the file system onto the LUN, so the backup software relies on the host to provide a stream of files for backup.
On all virtual platforms, a VM is stored as a file or series of files, even when using block-based storage arrays. This makes the backup process easier, as backups can be taken simply by taking a copy of the files that make up the virtual machine.
Some hypervisor vendors, such as VMware, offer APIs that allow third-party software to view changed block data within the virtual machine itself, providing a highly efficient way of backing up only those files that have changed since the last backup was taken. All hypervisor vendors also provide the ability to snapshot virtual machines. Although this results in a “crash consistent” copy, in some instances, with agent software, the snapshots can be coordinated with quiescing the host file system to allow consistent snapshots to be taken.
Storage tools will evolve
Storage continues to be a key feature in deploying scalable virtual infrastructures. As these environments scale and mature, storage administrators will need to employ tools and techniques such as automation and visualization software that will allow them to meet the challenges of an ever integrated IT world.
BIO: Chris Evans is a UK-based storage consultant. He maintains “The Storage Architect” blog.