This content is part of the Essential Guide: Managing storage for virtual environments: A complete guide
Get started Bring yourself up to speed with our introductory content.

Purpose-built storage for virtual machines

Vendors are offering storage systems specifically engineered for virtual machines with an approach that's fundamentally different from SAN or NAS.

With apologies to John Lennon and his great song, Imagine, here's a 21st century twist on the tune with more than a passing nod to storage:

Imagine no RAID groups

It's easy if you try

No LUNs to mess with

Volumes gone bye-bye

Imagine all the admins

Sleeping well at night ...

While not nearly as catchy as the original, it does make a point: Purpose-built virtual server storage has some significant differences from SAN and NAS. Rather than using the familiar constructs of RAID, LUNs and volumes on external shared storage arrays, virtual server storage is predominantly characterized by federated direct-attached storage (DAS) or purpose-built appliances. With most implementations, methods other than RAID are used to ensure data integrity and the storage software manages the relationship between the application server (a virtual machine) and the related data. This new architecture may actually improve data availability while simplifying the storage administrator's life. At least, that's what VM-specific storage is supposed to do.

Certainly, any modern-day storage can be configured to serve virtual machines. As a result, industry messaging can get a bit confusing. Terms like VM-ready and VM-aware have no industry standard definition, so vendors are free to use those phrases to mean whatever they want. Moreover, just because a system, such as software-defined storage (SDS), is built on top of a hypervisor doesn't mean it's uniquely suited to a virtual environment. To get past the label, IT managers need to look for products that correlate an application server (VM) directly with the related data, not a LUN or volume. If a product provisions LUNs and volumes in the traditional manner, it doesn't strictly stand up as a VM purpose-built system as we're using the term here. Given that the majority of IT organizations are more than 50% virtualized in their Windows/Linux environments -- with many approaching 90% virtualized -- this is an emerging market that should attract more than passing interest from storage managers.

VM storage vs. traditionalSAN/NAS: Five differences

1. No more RAID, LUNs or volumes required

2. Application servers are tied to associated data, not volumes

3. No "noisy neighbor" issues associated with shared volumes

4. Performance not tied to spindle count

5. Data integrity and recovery generally facilitated by a distributed data mechanism

Storage then and now

Some IT managers may question whether it makes sense to revisit the internal DAS architecture of yesteryear. SAN and NAS evolved from DAS architecture because managing storage siloes attached to servers was so difficult and typically very costly. This was principally driven by the evolution from relatively few mainframe-centric servers to distributed computing with hundreds of servers. SAN and NAS provided a way to centrally manage storage, improve utilization and enhance storage agility. Thus, SAN and NAS represented a significant revolution in storage management for distributed systems.

The server revolution to virtualized computing has had as much impact on storage as distributed systems had earlier. Virtual computing has evolved faster than storage has been able to keep up. At first, accommodating VMs was no big deal. A LUN allocation was a LUN allocation, and the storage system didn't care if it was physical or virtual. However, as VM migration evolved, the limits of SAN and NAS became apparent. While migrating the VM became trivial, having storage pinned to LUNs and volumes was a real anchor that dragged down the agility desired by organizations.

In addition, the ability to spin up VMs in a matter of minutes has contributed to significant performance deficiencies. Adding VMs on the fly to a volume can quickly oversubscribe the available aggregate IOPS. VMs can hog the performance of the disks, negatively impacting the other VMs assigned to the volume. This is called the "noisy neighbor" problem. Organizations typically respond by adding spindles, which are costly and may be poorly utilized as a result. To truly realize the benefits of virtual computing, storage solutions need to evolve beyond just SAN and NAS.

Virtual server storage architectures

The virtual server storage market is in its early stages and, as such, products are predominantly offered by emerging vendors, though established vendors are entering the market. To be truly successful, these products need to offer the best of both worlds: the direct relationship between the data and application server, like DAS combined with the convenience of centralized storage management, and the robust storage functionality found on SAN and NAS. These systems should also complement the agile nature of virtual computing without compromising performance or availability.

Given that this market segment is in its early evolution, it's characterized by highly differentiated products and dueling technologies. All have their particular strengths and target audiences, and give IT managers a wide range of solutions to choose from. Labels such as converged, hyper-converged and other monikers are bandied about, but without standard definitions, labels alone won't help IT managers to understand how products are positioned.

These products fall, more or less, into one of three groups:

  • Software-only
  • Integrated appliance
  • Storage appliance

Tintri's VMstore and Tegile Systems' HA-series and T-series arrays are examples of storage appliances, but they should not be lumped in with more traditional SAN/NAS arrays. Both have purpose-built operating systems (OSes) optimized for use in a VM environment. Tintri's OS allows all storage functions to be scheduled through the VM. Its internal file system treats each virtual machine as an individual entity and federates the storage into a single name space. Storage in VMstore is a combination of flash and hard disk drives (HDDs), but Tintri guarantees that 99% of I/Os will be serviced by high-performance flash. Tegile offers a hybrid array as well as an all-flash array. Its IntelliFlash software optimizes the media and data movement within the device. OS storage provisioning and monitoring at the VM level to manage capacity and IOPS performance is done by virtual machine rather than by volume.

EMC's ScaleIO and the Maxta Storage Platform (MxSP) are two software-only solutions in this market. ScaleIO is billed as "100% hardware agnostic." It can run in a hypervisor -- including VMware ESXi, Microsoft Hyper-V, Citrix XenServer or KVM -- or on a bare-metal OS such as Linux. While it can use storage arrays, EMC suggests the lowest total cost of ownership is achieved using DAS.

Although MxSP is a software product, Maxta provides reference architectures of servers, storage and network equipment. Users aren't limited to those configurations, but the reference architectures are pre-validated by the company. MxSP is designed for DAS, which can be a combination of solid-state drives (SSDs) and HDD. The Maxta Distributed File System, which provides a global namespace and supports VMDKs, is a log-based file system that supports block data movement across tiers.

EMC ScaleIO is a block-based, scale-out system that doesn't use a file system. The product has two main components: a ScaleIO Data Client (SDC) and a ScaleIO Data Server (SDS). Each one can be installed on any server, but the SDC kernel module must be installed on any node that requires data access. The SDS can be installed on nodes with DAS capacity. EMC touts having demoed up to 11 million IOPS with ScaleIO, while using just 20% CPU overhead.

Nutanix's Virtual Computing Platform is an example of an integrated appliance that includes compute, storage and software in each node. The minimum configuration is three nodes to provide sufficient resilience across a pool of resources in a shared-nothing architecture. Nutanix offers its own appliance or pre-qualified configurations using Dell servers. The Nutanix Distributed File System (NDFS) aggregates all nodes. An SSD tier is required, where all data writes are logged. Every node has access to the metadata, which uses MapReduce to enhance reliability and recoverability. Like all of the other products in this category, storage is provisioned at the VM level and NDFS manages data locality relative to the VM for optimized performance. Best-practice guidelines recommend a 10 Gbps Ethernet network for connectively between nodes.

Implementing VM-specific storage

Storage services such as deduplication, compression, thin provisioning and the like have become table stakes among storage products. It's no different among purpose-built virtual server storage systems, where storage managers can expect these capabilities to be built in. One major area of difference is how data is protected. Since RAID is not a part of these architectures, different products use various means to ensure data integrity and recoverability.

EMC ScaleIO, for example, uses a two-copy distributed "mesh" mirroring methodology to ensure recoverability and eliminate single points of failure. Each node has an authoritative mapping of system components to facilitate recovery. This map requires just 4 MB of memory to hold the metadata of up to 10 PB of actual data. In addition, data is striped across all available nodes, which significantly reduces rebuild times and reduces the risk of a double device failure.

Maxta MxSP always replicates data synchronously across nodes, even geographically; asynchronous capabilities are also available. Although the data may be replicated across geographically dispersed locations, the purpose is not so much for disaster recovery (DR) as it is for high availability (HA) and application availability, not just data availability.

Nutanix recently announced its Metro Availability functionality across data centers. Systems within 400 km of each other can achieve a zero recovery point objective and near-zero recovery time objective with the feature. It is useful for maintenance operations, HA and DR.

In some respects, purpose-built virtual server storage systems embody a disruptive technology because they change some fundamental architectural precepts. As such, they will initially be siloes within the data center. But make no mistake; this is a key storage technology of the future. Traditional SAN and NAS will be predominant for some time, but an architecture that simplifies storage management and complements virtual computing is inevitable. Storage managers will do themselves a favor by learning about virtual server storage systems now.

About the author:
Phil Goodwin is a storage consultant and freelance writer.

Dig Deeper on Storage for virtual environments