Adoption of storage virtualization has been accelerating as some of the early obstacles to implementation have fallen by the wayside. There’s a wide choice of mature products whether you decide to deploy storage virtualization at the array or in the network.
While there may be some dispute over an exact definition, storage virtualization is generally considered technology that provides a flexible, logical arrangement of data storage capacity to users while abstracting the physical location from them. It’s a software layer that intercepts I/O requests to the logical capacity and maps them to the correct physical locations.
The most basic implementation of storage virtualization is at the host level, where a logical volume manager allows the simple provisioning of storage capacity to apps and users. While also implemented with file storage systems, block storage virtualization is more commonly implemented due to the complexity of LUN management and the requirements for flexibility in storage provisioning, especially in multi-user environments. This article covers storage virtualization technologies at the network and storage device level, not at the host level.
Goodbye to groups, LUNs and partitioning
The legacy process of creating array groups, allocating LUNs and partitioning volumes is a complicated and inefficient way to provision storage, particularly when it involves balancing performance and reliability of physical disks across drive shelves. Similarly, expanding an existing host’s volume can be a time-consuming process of concatenating LUNs and copying data. Storage virtualization provides a better way to keep up with the demands of provisioning storage to applications and servers while reducing time and resources expended by allowing the “brains” of the storage system to make most of the decisions. It can also improve utilization by replacing the guesswork of manual allocation while supporting technologies like thin provisioning.
What is scale-out storage?
“Scale-out” storage refers to modular systems that combine processors and storage capacity into discrete physical nodes. This clustered architecture lets processing power expand with capacity as nodes are added, and provides for a more incremental, albeit non-heterogeneous, growth. While it could be called “device based,” virtualization in the scale-out space is more than a standard feature, it’s required. It enables these systems to scale non-disruptively while user volumes span nodes in the cluster.
Initially, virtualization was simply a tool used to provision and manage storage efficiently. But by isolating the host from physical storage, the technology also enabled storage capacity in different physical chassis (even from different manufacturers) to be logically combined into common pools that could be managed more easily. While some of these heterogeneous systems were used to create larger volumes than were physically present on any one disk array, most use cases employed storage virtualization as a common management platform. This enabled existing storage systems to be repurposed and reduced the overhead associated with managing multiple silos of storage, although the physical disk systems still needed to be maintained.
Virtualization can improve performance as host volumes are easily spread across larger numbers of disk drives, which could negatively affect capacity utilization. Virtualization also allows storage tiering and data migrations between devices, such as moving older data to an archiving appliance or hot database indexes to a solid-state drive (SSD) cache. These activities are typically carried out based on policies set at the host, application or file level, and the same data movement mechanism can be used to migrate data offsite for disaster recovery (DR) purposes.
In the traditional scale-up architecture where the controllers are separate from the disk shelves, virtualization at the storage device level is typically built into the controller operating system. As a standard feature it essentially provides a workable solution for provisioning the tens or hundreds of terabytes that modern storage arrays can contain. Most systems include the ability to create tiers of storage within a single virtualized system or among discrete systems, using different storage types (performance drives, capacity drives or SSDs) and different RAID levels. Some also include a policy engine and the ability to move file or sub-file data blocks among the tiers based on activity, application and so on. Most systems allow data to be copied to a second chassis for high availability or moved to a second system at a remote site for DR. While the majority of storage systems include virtualization, most don’t support storage from other vendors. For a heterogeneous virtualization solution, one that can consolidate different vendors’ storage systems, most options are network based.
A number of years ago, the conventional storage wisdom was that storage services, like virtualization, and to an extent storage control, would eventually reside in “smart switches” on the storage-area network (SAN). While at least one storage virtualization product is moving in that direction, the network implementation of storage virtualization technology has commonly been in the form of appliances. These appliances are essentially storage controllers that connect to disk arrays or storage systems from certified vendors, or they’re software that’s installed on user-supplied servers or virtual machines (VMs). Storage virtualization appliances connect to heterogeneous storage arrays directly, or via Fibre Channel (FC) or iSCSI SANs, but most provide the option of using their own disk capacity as well. Most solutions include some storage services, like file sharing, snapshots, data deduplication, thin provisioning, replication, continuous data protection (CDP) and so on.
In-band and out-of-band virtualization
Early on in the lifecycle of storage virtualization technology two primary architectures emerged: in-band and out-of-band virtualization. In-band implementations placed a controller between users and physical storage or the SAN, and passed all storage requests and data through that controller. Out-of-band products placed a metadata controller on the network that remapped storage requests to physical locations, but didn’t handle the actual data. That added complexity to the process but reduced the CPU load compared to in-band virtualization. Out-of-band storage virtualization also removed the potential disruption associated with decommissioning an in-band device, as users are disconnected from their data while storage is remapped. Most network-based virtualization solutions today use the in-band architecture, probably because CPU power is relatively plentiful compared to when storage virtualization first appeared. Another reason for the popularity of in-band solutions is that they’re easier to implement, which means faster time to market and fewer problems.
Storage virtualization products
File storage virtualization
While many storage systems include file services, they virtualize data at the block level. However, there are network-attached products that can consolidate standalone network-attached storage (NAS) systems. These appliances provide a global namespace to users on the front end and map file requests to the right physical NAS on the back end. These systems can also provide file storage tiering and migration, some even to cloud storage providers. Examples of file virtualization products include the following:
AutoVirt Inc. markets an out-of-band file storage virtualization software product that runs on a pair of Windows servers or virtual machines (VMs). It also provides a global namespace and a policy engine for data tiering, migration and archiving. Being out-of-band, it can be taken out of the environment without disruption.
Avere Systems Inc.’s FXT is a heterogeneous, scale-out NAS appliance implemented in clusters of up to 25 2U modules, each containing primarily solid-state (DRAM and solid-state drive) storage. The FXT cluster supports a global, tiered file system, typically encompassing NAS systems from other manufacturers; it also provides file virtualization across platforms.
F5 Network Inc.’s ARX products are a series of in-band file virtualization appliances that can consolidate multiple heterogeneous NAS devices behind a global namespace, supporting CIFS and NFS protocols. They also provide a policy engine that can automatically move files between NAS systems, locally or to the cloud, based on file attributes, activity or other criteria.
Virtualization has become an essential function for storage provisioning and is included in some form with most midsized and larger storage systems. While there are many differences between arrays and their virtualization technologies, the majority of these device-based implementations don’t support disk capacity from other manufacturers. Instead of listing the large number of these storage systems, we’ll focus on the smaller category of heterogeneous storage systems. The following are examples of heterogeneous storage virtualization as implemented in hardware and software products available from a variety of vendors.
DataCore Software Corp.’s SANsymphony is a network-based, in-band software product that runs on commodity x86 servers. It supports heterogeneous storage devices via FC, Fibre Channel over Ethernet (FCoE) or iSCSI, and connects to hosts as FC or iSCSI storage. Multiple-node clusters can be created to scale capacity and provide high availability. The system provides remote replication and storage services like synchronous mirroring, CDP, thin provisioning and tiered storage.
EMC Corp.’s Invista is an out-of-band software solution that runs on a pair of servers (called a Control Path Cluster or CPC) and interacts with “intelligent switches” from Brocade or Cisco. It can virtualize storage from most major vendors, connecting to storage and host servers via Fibre Channel. Invista provides mirroring, replication and point-in-time clones between storage arrays.
FalconStor Software Inc.’s Network Storage Server (NSS) is a network-based, in-band appliance that connects to heterogeneous storage systems via iSCSI, FC or InfiniBand, and supports host connectivity with Fibre Channel or iSCSI. Expansion and high availability are provided by connecting multiple controller modules. Besides WAN-optimized replication, NSS also provides synchronous mirroring, thin provisioning, snapshots and clones.
Hitachi Data Systems’ Universal Storage Platform V (USP V) is a tier 1 storage array system that also provides in-band heterogeneous connectivity to most major storage vendors’ arrays. It includes the kinds of features and services expected from a tier 1 solution, including thin provisioning of internal and externally attached storage.
IBM’s SAN Volume Controller (SVC) is a network-based, in-band virtualization controller that sits on the SAN and connects to heterogeneous storage systems via iSCSI or FC. Pairs of SVC units provide high availability, and up to eight nodes can be clustered to scale bandwidth and capacity. Each SVC module features replication between storage systems and a mirroring function between local or remote SVC units.
NetApp Inc.’s V-Series Open Storage Controller is an in-band virtualization solution that’s very similar to a NetApp filer controller, but configured to support heterogeneous storage arrays. It connects to a FC SAN on the back end to consolidate as much storage as desired from existing LUNs, and pools them into NetApp LUNs for block or file provisioning as would a regular NetApp filer.
NetApp recently acquired the Engenio Storage Virtualization Manager (SVM), a network-based, in-band virtualization controller that supports heterogeneous storage systems. Details of how NetApp will market this solution have yet to be announced.
Handle with care
Because most storage virtualization products are in-band, care should be taken to understand the effective performance of the virtualization appliance or cluster as this will be the gating factor to capacity expansion. In addition, storage services or features will also consume CPU cycles, further reducing effective capacity.
Storage virtualization is a powerful tool to reduce Capex by improving capacity utilization or performance, but its biggest benefit may be on the Opex side. It can simplify storage management, even across platforms, and reduce administrative overhead. Virtualization can also make storage expansion a relatively simple operation, often done without taking storage systems down or disrupting users.
BIO: Eric Slack is a senior analyst at Storage Switzerland.