Managing storage for virtual environments: A complete guide
A comprehensive collection of articles, videos and more, hand-picked by our editors
No matter how you look at it, the vSphere 6 release from VMware was a big deal. It was announced at VMware's Partner...
Exchange in February 2015 and the big news was the introduction of Virtual Volumes, or VVOLs. There were also serious improvements to VSAN, fault tolerance, vMotion, high availability, scalability, security, data protection, replication and more.
When viewed holistically, it is clear that VMware is pushing toward a completely software-defined data center in which all layers of the infrastructure are virtualized; the virtual machine (VM) is the center of attention; and provisioning, monitoring and management are all conducted by policy.
In this model, an application's importance dictates the level of resources it gets and how its SLA will be maintained, regardless of hardware or software failures and other calamities. The vSphere 6 release was a major step toward enabling this vision.
Fault Tolerance enhancements
Fault Tolerance (FT), which was first introduced with vSphere 4, is a way to keep an application running with zero downtime and zero data loss in face of a host failure. Unlike High Availability (HA), which requires an application to be restarted on another host, after experiencing a failure, FT works on the principle of keeping two hosts working in lockstep, so a failure of one simply becomes a non-event and the application simply keeps on running. No application-specific or OS-specific agents or configurations are needed. FT provided the ultimate in application protection but was always limited to simple applications that used only one vCPU. With vSphere 6, an application with up to 4 vCPUs can be FT-enabled. This brings FT into the world that needs it most: mission-critical applications.
Until now, FT hosts needed to share a common data store and a shared VM disk (VMDK). This limitation is now removed and each host can have its own VMDK on different data stores. In vSphere 5.5, one could not take a snapshot of an FT-enabled VM and, therefore, the only way to back up the VM was to add an agent on it. Now that limitation is removed and vSphere API for Data Protection is supported.
Previous versions of FT required a very specific type of virtual disk: thick provision eager zeroed. This restriction is now removed and the virtual disk can be eager zeroed, thick or thin provisioned. The host compatibility list, which was extremely restricted before, has now been expanded to be the same as for vMotion.
Historically, vMotion was limited to moving a VM from one host to another, both supported by one vCenter. This is no longer the case. Now VMs can be moved across different vCenters. VMware also removed the distance restriction that existed in vSphere 5.5. Now, hosts are no longer limited to a metro area with distances of less than 100 miles, or round-trip times (RTT) of less than 10 ms. The vMotion can take place across intercontinental distances as long as the RTT is less than 150 ms.
Now, vMotion can be genuinely used to migrate VMs for temporary or permanent migrations across data centers. Temporary migrations can be particularly useful for load balancing, moving applications close to where people will use them (call centers or international development groups, for instance) or as a precaution against impending weather events.
Improvements to HA
VMware High Availability (HA) works on the principle of maintaining a heartbeat between the hosts that run the protected VMs (in the same cluster). Upon detection of a hardware or OS failure, the application is failed over and restarted on the working host. While there is a short period of "application downtime," there is no data loss and, in most cases, it is imperceptible to the user.
Typically, storage issues have been the most difficult to deal with, in context of HA. With the vSphere 6 release, VMware has added support for Virtual Machine Component Protection, which provides enhanced protection from All Paths Down (APD) and Permanent Device Loss (PDL) in block (FC, iSCSI, FCoE) or file (NFS) storage.
Previously, vSphere had limited ability to detect PDL situations and no ability to deal with APD in the past. Now these conditions are detected, vCenter is informed, and automatic failover is triggered, requiring no administrator involvement.
Now, vSphere HA supports VVOLs, vSphere Network I/O Control, IPv6, NSX and vMotion across vCenter Servers. One can also configure up to 4,000 virtual machines on up to 32 hosts in HA configurations (which is the equivalent of a full 64 host/8,000 VM maximum cluster size).
Raising the ceiling
A single VM can now support 128 vCPUs and 4 TB of vRAM
A host can now support up to 480 vCPUs, 1,000 VMs and 12 TB of RAM. A data store can be as large as 64 TB
A cluster can now be as large as 64 nodes and 8,000 VMs. This maximum now applies to standard clusters as well as VSAN clusters (more later).
A single vCenter instance can now support up to 1,000 hosts and 10,000 powered VMs, up from 100 hosts/3,000 VMs in vSphere 5.5
The end result from all these increases is the ability to virtualize applications that were previously thought to be un-virtualizable. The ability to use more powerful x86 CPUs implies a reduction in the number of clusters that need to be managed, which in turn should result in lower space, power and cooling costs.
Microsoft WSFC integration
In the past, if you wanted to use Windows Server Failover Clustering (WSFC) for applications with vSphere, the support for applications was pretty limited. With vSphere 6, support has been added for Windows Server 2012 R2 and SQL Server 2012, two key applications that were missing before. AlwaysOn Availability Groups are also now supported. Paravirtual SCSI adapter support brings much better performance to the clustered environment compared to the use of standard SCSI adapters. Now, vMotion and Distributed Resources Scheduler (DRS) are fully supported with WSFC.
Data protection improvements
VMware made major enhancements to data protection products in late 2013, with the advent of VMware Data Protection Advanced (VDP-A) in vSphere 5.5. The new release, VDP 6.0, merges the functionality of VDP and VDP-A and is the only release available under vSphere 6 (it is free to all customers of Essentials Plus Kit 6.0, vSphere with Operations Management 6.0 editions, and all vCloud Suite 6.0 editions).
VDP 6.0 is based on EMC Avamar and uses variable-length data deduplication technology to perform disk-based backups for small to medium-sized businesses. It is integrated with vSphere and ESXi and is managed entirely by the VM administrator, using vCenter and VMware Web Client. It is designed to protect up to 800 VMs (using up to 10 VDP appliances, each of which can support up to 200 VMs and 8 TB of deduplicated data), even though realistically it works best for about 100 to 250 VMs. For larger configurations, a customer can integrate with Data Domain appliances. VDP 6.0 has built-in functionality for replication for backups, either to other VDP 6.0 appliances or to EMC Avamar appliances that may already be present in some larger accounts.
External proxies are now supported. These can be deployed in other vSphere clusters in the local site, or in remote sites for increased efficiency in network bandwidth utilization. Up to 24 concurrent streams of backup are feasible with external proxies. Red Hat Enterprise Linux Logical Volume Manager and Ext4 file system are supported.
But, VDP 6.0 has limitations. It is designed for customers that find an RPO of 24 hours to be acceptable. VMs can be recovered within a range of five minutes to a few hours, according to VMware. SRM is not supported. If better RPOs and RTOs are required, VMware recommends using third-party backup products and vSphere Replication for VM replication (vs. backup replication, as in the case of VDP 6.0).
vSphere Replication updates
Full synchronizations are now more efficient for specific storage arrays, because vSphere Replication can interact with vSphere and get storage allocation information to reduce network traffic. Before the vSphere 6 release, moving a replica of a VM on the remote site using Storage vMotion required a full synchronization before the VM could be moved. This is no longer the case. The end result is it is much easier to balance resources using Storage vMotion and DRS, without violating RPOs for VM recovery.
Up to 24 recovery points can be chosen per VM. RPOs as fine as 15 minutes may be set on a per-VM basis. The vSphere Replication already used CBT to minimize network traffic but now the admin may choose compression as an option, for even more network bandwidth efficiency.
It is important to understand that vSphere Replication is not associated with VDP 6.0, which has its own replication engine. Also, vSphere Replication is designed to replicate VMs whereas the replication engine built into VDP is designed to replicate backup objects that contain VMs. No data deduplication technology is built into vSphere Replication. Unlike VDP 6.0 this product is designed to be used with third-party tools, not just VMware's own tools.
VSAN introduces all-flash configuration
The introduction of Virtual Volumes (VVOLs) and improvements to Virtual SAN (VSAN) are the most important aspects of the vSphere 6 release. Both of these products are designed to abstract and pool storage and storage services to allow provisioning, monitoring and management of storage on a policy basis, at a VM level of granularity.
The previous version of VSAN only supported a hybrid configuration in which flash was used exclusively as read cache and hard disk drives (HDDs) as persistent capacity tier. VSAN 6.0 introduces an all-flash configuration where a portion of the flash capacity (solid-state drive- or PCIe-based) is used exclusively as write cache and the remaining capacity is used as a persistent tier. Scaling can be achieved across both performance and capacity by adding fully configured nodes (hybrid or all-flash), or independently by adding additional flash for performance or additional HDDs for capacity. In an all-flash configuration, additional capacity can be added with PCIe flash or solid-state drives by marking them for capacity, rather than caching. VMware also increased the maximum capacity of a virtual disk to 62 TB.
The performance of both the hybrid and all-flash models was enhanced with a new disk format. In like configurations and workloads, the hybrid configuration performance increased by a factor of 2x over VSAN 5.5, according to VMware. The all-flash version delivers a 4x performance multiple over a similarly configured VSAN 5.5 (i.e., 2x the performance of a hybrid).
The maximum cluster size was increased from 32 to 64 nodes. Both hybrid and all-flash models can support up to 200 VMs per node, for a maximum of 6,400 VMs per cluster. The new models allow VSAN-based configurations to support workloads exemplified by tier 1 mission-critical applications.
In a 32-node cluster, VMware measured in excess of 4M IOPS for 100% reads and greater than 1.2M IOPS for mixed workloads of 70% reads and 30% writes, yielding 40K IOPS per host. In an all-flash version, the IOPS jump to 7M for read-only workloads, for an average of 90K IOPS per host. The 64-node clusters are expected to yield linear increases in performance.
Snapshot and clone functionality was improved as well. The system allows the creation of up to 32 snapshots/clones per VM, or 16K snapshots/clones per cluster.
Additional improvements relating to power failures or rack failures were added and blade infrastructures are now supported.
Vendors vouch for VVOLs
What VSAN does for direct attached storage, VVOLs achieve for external storage. I covered VVOLs extensively in an April 2015 article for Storage magazine. Since then, we have learned more about the wide variety of implementations from various vendors.
On the surface, most, if not all, storage vendors have pledged support for VVOLs. But under the covers, the differences in implementations are astounding. In a survey of 11 vendors (Dell, EMC, HDS, HP, IBM, Kaminario, NetApp, Nexenta, NexGen, Pure Storage and SolidFire) conducted in March 2015, Taneja Group asked 32 questions to understand these differences. We categorized the vendors into one of three types:
- Type 1 products deliver the most rudimentary support of VVOLs in which the user can carve out a number of static storage containers, each with a unique set of qualities (class of service). These could include the type of storage and the variety of storage services available (snapshots, compression and so on).
- Type 2 products are exemplified by the creation of a single storage container with a wide variety of storage types and services, any of which may be selected (or not) to produce a unique set of capabilities that can then be applied to a given VM. Quality of service (QoS) is also a hallmark of Type 2 products. That means minimum or maximum resources (capacity, IOPS, latency, throughput) can be assigned to a given VM and the SPBM policy engine would honor these.
- Type 3 extends Type 2 with the ability to deal with resource contention. In other words, not only does it offer QoS functionality but it also knows how to deal with multiple VMs vying for resources when the array functionality is maxed out.
Most vendor products fell in the Type 1 and Type 2 categories, with only NexGen showing up in the Type 3 category.
Beyond the type of implementation, we discovered vast differences in scalability of products. For instance, the number of Protocol Endpoints (PEs) per Storage Container (SC), SCs per array, VVOLs-based VMs per array, VVOLs per array, VVOL-snapshots per array, clones per VM and clones per array varied widely across vendor products and sometimes even between different products from the same vendor. For example, the total number of VVOLs per HDS arrays was listed as 400,000 (file) or 64,000 (block) plus 1 million snapshots (for either file or block products). This contrasted with only 1,024 supported VVOLs per array for Dell's EqualLogic product.
The number and type of data services that can be surfaced via VASA 2.0 to Storage Policy-Based Management also varied across the board. These differences point out several facts. First, implementing VVOLs support in existing arrays is nontrivial and the architecture plays a significant role in how fully VVOLs can be supported. Second, the specs will dictate how far one can scale a product. The number of VMs an array can support is directly related to the number of VVOLs it can support, given each VM uses up a minimum of three VVOLs and each snapshot costs one VVOL. These are not necessarily an indication of weakness, as many other factors dictate which array is right for a given job, but it does indicate how far the array can go in the dimension of VVOL support. The full list of questions we asked in the survey is available by sending us an email.
Make no mistake: vSphere 6 is a major release from VMware by any standard. It is loaded with storage functionality from top to bottom, with significant increases in configuration maximums and major enhancements in VSAN, HA, FT, WSFC, data protection, replication and vMotion. And of course, the introduction of VVOLs puts VM-centricity in the forefront and brings external storage within VMware's software-defined vision.
Get to know vSphere 6
Backup Exec 15 supports vSphere 6
Five things to remember about VVOLs