Complete guide to server-based storage in its modern forms
A comprehensive collection of articles, videos and more, hand-picked by our editors
Nothing in the storage world elicits more divergent opinions than the term "software-defined storage" (SDS). With...
no universally accepted definition, SDS is vendor-specific. Vendors shape the SDS definition to match their storage offerings. The result is that every storage vendor appears to offer SDS.
The closest the software-defined storage market has come to a SDS consensus is more marketecture than architecture.
Software-defined storage separates the data storage hardware from the software that manages it. The storage software is itself hardware-independent. The storage control plane is usually, but not always, separated from the data plane.
That broad definition enables just about every variation of storage currently available. So it's up to the software-defined storage market consumer to determine which ones work best for them.
Driving forces behind the SDS trend
All storage systems have always been software-defined. What's changed is that the software has become portable.
Storage system software historically was tied to the hardware it managed. When the hardware ran out of capacity or performance, it had to be replaced and the software licensing was repurchased along with the hardware.
What made matters significantly worse was that storage system architectures created isolated silos. Unique infrastructures made everything from storage provisioning, data protection, disaster recovery, tech refresh, data migration, power and cooling more and more untenable. Compound that with the ongoing trend of rapid data growth and the need to store ever-increasing amounts of data, and the available architectures made storage systems management too complicated, difficult, expensive and ultimately unmaintainable.
Several technological factors contributed to the software-defined storage market phenomenon as well. The first is the direct result of the continuous x86 compute architecture performance improvements. The x86 performance improvements and availability of cores for specific storage functions has led to x86 architectural standardization for storage systems.
An additional technological factor aiding SDS is the general acceptance of x86 virtualization of servers, desktops, applications and networking (SDN). That has helped condition IT into accepting separation of the data image from the hardware upon which it resides.
The popularity of cloud technologies has also had a major effect on driving the software-defined storage market. The cloud data centers needed a new and much lower-cost storage architecture based on industry standards and commodity hardware.
Other technological factors driving SDS include server-side flash storage and the software that allows memory and server storage to be transparently shared with other physical server hosts.
All of these technology changes eroded the differentiation between server and storage hardware while expediting storage software portability and flexibility, and, not inconsequentially, also radically reducing storage costs.
SDS categories pros and cons
With no working standard SDS definition, a variety of technologies have emerged in the software-defined storage market. For our purposes, the four categories of SDS include:
- Hypervisor-based SDS
- Hyper-converged infrastructure (HCI) SDS
- Storage virtualization SDS
- Scale-out object and/or file SDS
There are both significant differences and equally significant similarities between these categories, and several products may actually fit into multiple categories. And some products are unique enough to be in their own category such as PernixData or Saratoga Speed.
Since SDS is focused on flexibility, simplicity, scalability with performance and total cost of ownership (TCO), we'll use those criteria to evaluate the pros and cons of each SDS approach.
VMware invented this category with VMware vSphere Virtual SAN. This is the only category that is a specific product. Virtual SAN is architected to be a part of vSphere and operates as a feature of vSphere and works with all vSphere virtual machines and virtual desktops. Virtual SAN runs in the ESXi layer which means it's not a virtual storage appliance and doesn't require a VM to execute.
Hypervisor-based SDS pros:
Flexibility. Virtual SAN works with both hard disk drives (HDD) and solid-state drives (SSD) including DIMM-based flash drives, PCIe, SAS, SATA and even NVMe. VMware Virtual SAN supports both HDDs and SSDs in a hybrid mode or all SSDs in all-flash mode.
Scalability and performance. Virtual SAN is highly scalable while delivering high levels of performance. It scales out through vSphere clustering and can support up to 64 vSphere hosts per cluster. Each vSphere host supports approximately 140 TB raw storage capacity and well north of 8 PB of raw storage capacity per cluster. On the performance side, each Virtual SAN host can supply up to 90,000 IOPS, yielding more than 5 million IOPS per cluster.
Simplicity. Virtual SAN is simple because it's natively integrated as part of the VMware stack. It feels and acts like all other vSphere features so it's intuitive for a vSphere administrator. Virtual SAN automates storage tasks on a per-VM basis such as provisioning, snapshots/data protection, high availability, stretch clusters, disaster recovery and business continuity. Even data migration to a Virtual SAN can be accomplished relatively simply via vSphere Storage vMotion.
Total cost of ownership (TCO). Compared to legacy storage architectures, its TCO should be less. The saving comes from the difference in the price of drives (HDDs and SSDs) in a storage system compared to the same drives in a server. Those drives are typically three times more expensive in the storage system. Some of the other Virtual SAN cost advantages come from the predictable pay-as-you go scaling, unified storage management, unified data protection, disaster recovery and business continuity; and consolidated storage networking.
Hypervisor-based SDS cons:
Flexibility issues. Virtual SAN is a closed-loop SDS in that it only works with VMware vSphere 5.5 or better. Older ESXi implementations, other hypervisors, or physical machines don't work with Virtual SAN. It can't be used by virtual or physical machines that are not part of the vSphere cluster. There is an element of do-it-yourself (DIY) to Virtual SAN. For example, running on inexpensive commoditized hardware is somewhat limited to VMware's hardware compatibility list (HCL). If hardware isn't on the list, it's not supported.
Scalability and performance issues. Virtual SAN clusters cannot exceed 8.8 PB. If more capacity is required, it is not a good fit. If a VM requires more IOPS than the 90,000 available in their vSphere host, it can get them from other nodes in the cluster, but with at a considerable latency penalty. Inter-cluster storage performance is another issue. Most Virtual SAN clusters use 10 Gbps to 40 Gbps Ethernet and TCP/IP to interconnect the hosts. This architecture essentially replaces a deterministic system bus with a non-deterministic TCP/IP network so latencies between hosts become highly variable. Unless the cluster uses more sophisticated and faster interconnections, its storage performance from one clustered host to another will be highly variable and inconsistent.
Some things are not so simple. Converting from a siloed storage environment to a pure Virtual SAN requires converting non-VM images to VMs first. It's a time-consuming process for non-vSphere environments.
TCO issues. Until the most recent release -- version 6.2 -- Virtual SAN lacked deduplication and compression capabilities. This raises costs per usable TB considerably versus SDS products that include data reduction. In addition, making sure data and VMDKs on a specific clustered vSphere host remain available to the rest of the cluster in case that host fails currently requires multi-copy mirroring. Best practices require at least two copies of the original data and many administrators opt for three copies. This practice eliminates the drive price advantages. And because Virtual SAN is a vSphere exclusive option, it has its own license costs that can be substantial.
Hyper-converged infrastructure (HCI) SDS
HCI combines the server, storage, networking and hypervisor, and packages them into clustered nodes. HCI is designed to eliminate do-it-yourself integration headaches, expensive server hardware, the need to over-provision storage, high availability issues, complex storage management and hardware compatibility issues. There are many HCI options from server and other vendors, including: Atlantis, Cisco, Dell, EMC, Fujitsu, Gridstore, Hitachi, HPE, IBM, Lenovo, Maxta, NEC, Newisys, Nutanix, Quanta, Saratoga Speed, Scale Computing, SimpliVity, StarWind, StorMagic and SuperMicro.
Hyper-converged infrastructure (HCI) SDS pros:
Flexibility. As with VMware Virtual SAN, a VM administrator can control the storage. In fact, several HCI implementations are based on VMware vSphere and Virtual SAN including VMware's EVO:RAIL reference design. Several HCI vendors offer a choice of hypervisors from vSphere, Hyper-V, KVM or XenServer and some have "bare metal" offerings available with Linux using Docker containers or application virtualization (Saratoga Speed) without requiring a hypervisor. Many HCI implementations allow different capacity-sized nodes within the cluster. A few are software-only such as Maxta, StarWind and StorMagic. Maxta partners with most of the main server vendors including Dell, Quanta and SuperMicro.
Scalability and performance. Scaling HCI is as simple as adding a node to the cluster. Scaling storage capacity just requires adding drives (HDD or SSD) up to a node's maximum or adding additional nodes. Each HCI product has its own scalability and performance limitations; however, most scale well into the PBs and add performance linearly with each server node added to the cluster.
Simplicity. Plug it in, turn it on, configure and you're done. Few systems are simpler. No DIY, and there's just one throat to choke for support.
Total cost of ownership (TCO). Similar to VMware Virtual SAN. Many HCI vendors include inline deduplication and/or compression that can reduce total capacity requirements by as much as 83% to 90% depending on the data, which reduces TCO significantly.
Hyper-converged infrastructure (HCI) SDS cons:
Flexibility issues. HCI are closed-loop SDS systems so their storage only works with the server nodes in the cluster. Any physical or virtual host not in the HCI cluster cannot access the HCI storage. (There are exceptions: Saratoga Speed provides what they call all-flash ultra-converged infrastructure (UCI) which can act as a target for physical or virtual hosts outside the UCI cluster.)
Cluster hardware is limited to what the HCI vendor provides, or hardware certified by the software-only HCIs. As with VMware Virtual SAN, there is vendor lock-in and replacing the vendor requires migrating everything from the old HCI to the new, which can be time-consuming and tedious.
Scalability and performance issues. HCI cluster capacity is limited by the number of nodes supported in the cluster and the amount of capacity supported per node. If a VM requires more IOPS than the 90,000 available in their vSphere host, it can get IOPS from other nodes, but with a considerable latency penalty. Inter-cluster storage performance is another issue. Most Virtual SAN clusters use 10 Gbps to 40Gbps Ethernet and TCP/IP to interconnect the hosts so latencies between hosts can be highly variable.
Some things are not so simple. Converting from a siloed storage environment to an HCI cluster requires first converting both non-VM images and VMs to the HCI VMs or Docker containers, a time-consuming process.
TCO issues. HCIs -- like Virtual SAN -- have an issue in making sure data, VM images, VD images and Docker container images on a specific HCI node remain available to the rest of the cluster should that node fail. Today, that requires multi-copy mirroring. Best practices require minimally two copies of the original data and more commonly three copies which increases total capacity requirements and related costs.
Storage virtualization SDS
Storage virtualization SDS is the most mature variation of SDS in the software-defined storage market. It's been around since the early 2000s, when it was just called storage virtualization. Storage virtualization SDS is primarily the entire storage software stack including all storage services, optimized to run in the x86 architecture and convert hosts into powerful full-featured storage controllers. It virtualizes server storage and external storage systems to create a single or multiple virtual storage pools with different capacities, data protection policies and performance characteristics. Storage virtualization SDS essentially converts x86 servers into storage systems; some products can also run as VMs as a virtual storage appliance (VSA). Storage virtualization SDS is primarily scale-up architecture, but some products can scale-out as well. They're architected to eliminate costly proprietary hardware, take advantage of lower-cost server drives, repurpose older storage systems and simplify data migration. Some of the better-known players/products include: DataCore Software SANSymphony, EMC VIPR, IBM SVC, Microsoft Windows 2012 R2 (and higher), NetApp Ontap Cloud, Nexenta Systems NexentaStor, QuantaStor and Starwind Software.
Storage virtualization SDS pros:
Flexibility. It works with most x86 physical hosts or VMs as long as the hardware or hypervisor is certified and supported by the vendor. It converts all storage that sits behind it into the virtual storage pool, enabling repurposing of older storage. The scale-out versions permit physical or VM access to any node. Multi-copy mirroring isn't necessary to protect against a single controller failure, although it's available. Storage virtualization SDS can be provided as software or bundled with server hardware similar to HCI.
Scalability and performance. Scaling is multi-dimensional as each node in the cluster can scale-up and more nodes can be added to scale out. Generally, storage virtualization SDS is equivalent to most active-active silo storage systems.
Simplicity. When bundled with hardware, storage virtualization SDS is a very simple storage system. It leverages commodity off-the-shelf hardware, has better scalability and in some cases provides both block (SAN) and file (NAS). But in the end, it's still a siloed storage system in an inexpensive container.
Total cost of ownership (TCO). The biggest cost savings in storage virtualization SDS comes from commodity hardware and server-based drives. Another cost saving comes from inline data reduction technologies. Compared to equivalent storage systems, most storage virtualization SDS will yield a much more favorable TCO.
Storage virtualization SDS cons:
Flexibility issues. Most storage virtualization SDS can only run on the specific commodity hardware certified and supported by the vendor. Products that can run as VSAs require hypervisors certified and supported by the vendor.
Scalability and performance issues. On paper, these systems support tremendous capacity scalability, but the pragmatic approach is a bit different. Storage virtualization SDS capacity is constrained by x86 server limitations. Each server can handle only so much capacity before performance declines below acceptable levels. Storage virtualization SDS scale-out is constrained by clustering because the number of storage controller nodes supported is limited. Performance may also be constrained by the same limitations.
Some things are not so simple. Storage virtualization SDS is primarily DIY system integration which requires testing, QA and efforts to make sure the software works correctly with the hardware. Implementation may require professional services or a systems integrator.
TCO issues. Licensing can be a bit pricey depending on the vendor. And not all storage virtualization SDS products provide inline deduplication and/or compression. These issues can have a deleterious impact on TCO.
Scale-out object and/or file SDS
Recently we've seen the introduction of scale-out object SDS. Object storage manages data as objects which contain the data, metadata and a unique identifier. There are quite a few object storage players along with two open-source variations in OpenStack Swift and Ceph (distributed and supported by RedHat).
Scale-out file SDS is a highly scalable NAS often with special characteristics such as object storage resilience or unique metadata (Qumulo). Some scale-out file SDS products actually sit atop object storage (Exablox) and others are essentially clustered scale-out implementations of IBM's General Parallel File System (Spectrum Storage).
Scale-out object and/or file SDS pros:
Flexibility. Both scale-out SDS architectures are designed from the ground up for x86 servers. Some can be implemented as software on hardware certified by the vendor while others are bundled with server hardware. They are not designed to be VSAs and typically are intended for secondary or non-mission critical applications.
Many scale-out object or file SDS products can act as HDFS storage for Hadoop implementations. That can significantly lower the cost of HDFS storage by reducing the number of mirrored copies required and allowing re-purposing NFS or SMB data.
Scalability and performance. Scaling is multi-dimensional: each node can be scaled individually and generally the cluster itself can add nodes for capacity or performance. Performance for both will never approach that of high-performance block storage.
Simplicity. When bundled with hardware, scale-out object or file storage is very simple to setup, configure, and manage. Implementing it as software requires DIY systems integration. Both types leverage commodity hardware, have exceptional scalability and -- in the case of scale-out object storage -- unmatched data resilience and longevity via erasure coding.
Total cost of ownership (TCO). Both types are designed to be low-cost and offer very few add-on functions, usually licensed on an annual subscription basis. Scale-out object storage with erasure codes can lower the overall cost per GB as it requires less overhead than traditional RAID and replication data protection.
Scale-out object and/or file SDS cons:
Flexibility issues. Whether delivered as software or bundled with hardware, the hardware must be certified and supported by the vendors.
Scalability and performance issues. Scale-out file SDS generally doesn't scale as high as scale-out object storage, but object will have somewhat higher latencies. Object storage has significant additional latencies from the metadata and data resiliency functions. Both types are best suited for secondary applications where high performance is not a requirement.
Some things are not so simple. When scale-out file or object storage SDS is purchased as software, it's a DIY project, so special skills, professional services or a systems integrator may be required.
In addition, when these types of SDS are used for secondary applications such as archive, the data must be moved from its current location; some vendors have products that can do this, most rely on third-party software.
TCO issues. Data reduction -- deduplication or compression -- currently isn't available with scale-out object SDS and rarely with scale-out file SDS. That adds to the TCO.
Software-defined storage market bottom line
SDS is a broad marketing term with a variety of software-defined storage market implementations, each with its own pluses and minuses.
Selecting the right SDS for the job requires accurate understanding of the application, storage capacity and performance requirements, the organization's skill set, and what the software-defined storage market is capable of handling.
Essential guide on SDS technology
Jon Toigo defines software-defined storage
Marc Staimer asks:
What matters most to you when looking at the software-defined storage market?
4 ResponsesJoin the Discussion