This content is part of the Essential Guide: Live Migration vs. vMotion: A guide to VM migration
Problem solve Get help with specific problems with your technologies, process and projects.

VSphere Metro Storage Cluster: Cross-site HA, DRS from one cluster

Find out how vSphere Metro Storage Cluster delivers cross-site HA and load distribution, as well as which IT organizations are a good fit for it.

In most enterprises, vSphere works fine with multiple sites, with clusters at each site, storage at each site, vCenter at each site and replication in between sites for disaster recovery. But a small percentage of large IT organizations need much greater levels of resiliency and load distribution in their virtual environments. VMware's vSphere Metro Storage Cluster (vMSC) can deliver this.

With a vSphere Metro Storage Cluster, an IT organization essentially splits its vSphere infrastructure between two locations but treats it as if it were still one data center, managed as a single cluster. (It is possible -- but not recommended -- to manage two sites from a single vCenter server without vMSC, but with vMSC you are able to manage two sites inside a single vCenter cluster.)

While very few IT shops have the high-end business requirements and infrastructure that could benefit from vSphere Metro Storage Cluster, for those that do (such as financial trading companies, federal governments, high-transaction Web sites, etc.), vMSC enables the use of VMware High Availability (HA) and VMware Distributed Resource Scheduler (DRS) across two data centers, delivering resiliency and load distribution that can’t be accomplished any other way in a VMware environment.

Thanking Scott Lowe

I first heard about "stretched clusters" from VMware Certified Design Expert (VCDX) and author Scott Lowe when he spoke on the topic at the Carolinas VMware User Group in 2011. I videoed Scott's 50-minute presentation on vMSC and posted it here: Pros and Cons of Stretched Cluster Design by Scott Lowe. Since then, he has blogged about them and posted his vMSC presentation here. Much of what I know about vMSC, I owe to Scott's teachings. – David Davis

But there's a long list of requirements and caveats to providing such functionality: An IT shop needs two data centers with a minimum bandwidth of 622 Mbps between them and active/active SANs. Maximum network latency can't exceed 10 milliseconds round trip. And while vMSC carries some disaster recovery benefits, it is not a DR solution.

In addition, VMware HA isn't site-aware so some of its functionality is out of your control, and VMware DRS isn't site-aware, and creating rules to try to control it is challenging. Other advanced features such as Storage Distributed Resource Scheduler (SDRS) are also not site-aware.

Network configuration for vMSC is complex and requires special networking solutions such as overlay transport virtualization (OTV). Finally, daily operational management of the VMs in the cluster can be problematic as traditional tasks like backup and recovery, vMotion, and disaster recovery need to take the vMSC into account.

To configure a vMSC, you need a single vSphere cluster, as created in vCenter, made up of multiple active/active ESXi hosts. Those ESXi hosts would be located in different physical locations, either across a metro area or a larger geographical area.

The required active/active synchronous storage is usually accomplished either with a "stretched SAN" or using distributed virtual storage. A stretched SAN involves the creation of multiple virtual SANs (VSANs) with inter-VSAN routing, where only the primary site is read/write (generally limited to about 100 kilometers in distance). Distributed virtual storage involves a clustered file system and caching.

The VMware vSphere Metro Storage Cluster Whitepaper (which I highly recommend using as your design guide if you want to implement vMSC) offers the following specific network requirements between the sites:

  • The maximum supported network latency between sites for the VMware ESXi management networks is 10 milliseconds round-trip time.
  • 10 milliseconds of latency for vMotion is supported only with VMware vSphere Enterprise Plus edition licenses, which include the Metro vMotion option.
  • The maximum supported latency for synchronous storage replication links is 5 milliseconds round-trip time.
  • A minimum of 622 Mbps network bandwidth, configured with redundant links, is required for the ESXi vMotion network.

With VMware now offering its vSphere Metro Storage Cluster Whitepaper and certifying storage as vMSC-compatible, it's clear that VMware plans more support for vMSC. Undoubtedly, the company will enhance vSphere and vCenter to support vMSC in time, but a large financial barrier to entry still exists in the form of two or more data centers, two or more advanced SAN solutions, advanced networking gear and vSphere Enterprise Plus licenses.

To be candid, vSphere Metro Storage Clusters aren't the right solution for the majority of companies out there. But, for very large enterprises that need cross-site high availability and load distribution and that have the funds to support at least two data centers connected by very low-latency, high-speed bandwidth, vSphere Metro Storage Clusters are something you should consider.

David Davis is the author of the best-selling VMware vSphere video training library from TrainSignal.

Dig Deeper on Storage pooling