Storage virtualization is the pooling of physical storage from multiple storage devices into what appears to be a single storage device -- or pool of available storage capacity -- that is managed from a central console. The technology relies on software to identify available storage capacity from physical devices and to then aggregate that capacity as a pool of storage that can be used in a virtual environment by virtual machines (VMs).
The virtual storage software intercepts I/O requests from physical or virtual machines and sends those requests to the appropriate physical location of the storage devices that are part of the overall pool of storage in the virtualized environment. To the user, virtual storage appears like a standard read or write to a physical drive.
Even a RAID array can sometimes be considered a type of storage virtualization. Multiple physical disks in the array are presented to the user as a single storage device that, in the background, replicates data to multiple disks in case of a single disk failure.
Types of storage virtualization
There are two basic methods of virtualizing storage: file-based or block-based. File-based storage virtualization is a specific use case, applied to network-attached storage (NAS) systems. Using the Server Message Block (SMB) or Network File System (NFS) protocols, file-based storage virtualization breaks the dependency in a normal NAS array between the data being accessed and the location of physical memory. This enables the NAS system to better handle file migration in the background to improve performance.
Block-based or block access virtual storage is more widely applied in virtual storage systems than file-based storage virtualization. Block-based systems abstract the logical storage, such as a drive partition, from the actual physical memory blocks in a storage device, such as a hard disk drive (HDD) or solid-state memory device. This enables the virtualization management software to collect the capacity of the available blocks of memory space and pool them into a shared resource to be assigned to any number of VMs, bare-metal servers or containers.
For the user to access that data in the physical storage devices, the virtualization software needs to either create a map using metadata or, in some cases, use an algorithm to dynamically locate the data on the fly.
An early version of block-based virtualization was IBM's SAN Volume Controller (SVC), now called IBM Spectrum Virtualize. The software runs on an appliance or storage array and creates a single pool of storage by virtualizing logical unit numbers (LUNs) attached to servers connected to storage controllers. Spectrum Virtualize also enables customers to tier block data to public cloud storage.
Another early storage virtualization product was Hitachi Data Systems' TagmaStore Universal Storage Platform, now known as Hitachi Virtual Storage Platform (VSP). Hitachi's array-based storage virtualization enabled customers to create a single pool of storage across separate arrays, even those from other leading storage vendors.
Storage virtualization today usually refers to capacity that is accumulated from multiple physical devices and then made available to be reallocated in a virtualized environment. Modern IT methodologies, such as hyper-converged infrastructure (HCI), take advantage of virtual storage, in addition to virtual compute power and often virtual network capacity.
There are multiple ways storage can be applied to a virtualized environment:
Host-based storage virtualization is seen in HCI systems and cloud storage. In this case, the host, or a hyper-converged system made up of multiple hosts, presents virtual drives of a set capacity to the guest machines, whether they are VMs in an enterprise environment or PCs accessing cloud storage. All of the virtualization and management are done at the host level via software, and the physical storage can be almost any device or array.
Array-based storage virtualization most commonly refers to the method in which a storage array presents different types of physical storage for use as storage tiers. How much of a storage tier is made up of solid-state drives (SSDs) or HDDs is handled by software in the array and is hidden at the guest machine or user level.
Network-based storage virtualization is the most common form used in enterprises today. A network device, such as a smart switch or purpose-built server, connects to all storage devices in a Fibre Channel (FC) storage area network (SAN) and presents the storage as a virtual pool.
Storage virtualization disguises the actual complexity of a storage system, such as a SAN, which helps a storage administrator perform the tasks of backup, archiving and recovery more easily and in less time.