Complete guide to server-based storage in its modern forms
A comprehensive collection of articles, videos and more, hand-picked by our editors
Nothing in the storage world elicits more divergent opinions than the term "software-defined storage" (SDS). With...
no universally accepted definition, SDS is vendor-specific. Vendors shape the SDS definition to match their storage offerings. The result is that every storage vendor appears to offer SDS.
The closest the software-defined storage market has come to an SDS consensus is more marketecture than architecture.
Software-defined storage separates the data storage hardware from the software that manages it. The storage software is itself hardware-independent. The storage control plane is usually, but not always, separated from the data plane.
That broad definition enables just about every variation of storage currently available. So it's up to the software-defined storage market consumer to determine which ones work best for them.
Driving forces behind the SDS trend
All storage systems have always been software-defined. What's changed is that the software has become portable.
Storage system software historically was tied to the hardware it managed. When the hardware ran out of capacity or performance, it had to be replaced and the software licensing was repurchased along with the hardware.
What made matters significantly worse was that storage system architectures created isolated silos. Unique infrastructures made everything from storage provisioning, data protection, disaster recovery, tech refresh, data migration, power and cooling more and more untenable. Compound that with the ongoing trend of rapid data growth and the need to store ever-increasing amounts of data, and the available architectures made storage systems management too complicated, difficult, expensive and ultimately unmaintainable.
Several technological factors contributed to the software-defined storage market phenomenon as well. The first is the direct result of the continuous x86 compute architecture performance improvements. The x86 performance improvements and availability of cores for specific storage functions has led to x86 architectural standardization for storage systems.
An additional technological factor aiding SDS is the general acceptance of x86 virtualization of servers, desktops, applications and networking (SDN). That has helped condition IT into accepting separation of the data image from the hardware upon which it resides.
The popularity of cloud technologies has also had a major effect on driving the software-defined storage market. The cloud data centers needed a new and much lower-cost storage architecture based on industry standards and commodity hardware.
Other technological factors driving SDS include server-side flash storage and the software that allows memory and server storage to be transparently shared with other physical server hosts.
All of these technology changes eroded the differentiation between server and storage hardware while expediting storage software portability and flexibility, and, not inconsequentially, also radically reducing storage costs.
SDS categories pros and cons
With no working standard SDS definition, a variety of technologies have emerged in the software-defined storage market. For our purposes, the four categories of SDS include:
- Hypervisor-based SDS
- Hyper-converged infrastructure (HCI) SDS
- Storage virtualization SDS
- Scale-out object and/or file SDS
There are both significant differences and equally significant similarities between these categories, and several products may actually fit into multiple categories. Since SDS is focused on flexibility, simplicity, scalability with performance and total cost of ownership (TCO), we'll use those criteria to evaluate the pros and cons of each SDS approach.
VMware invented this category with VMware vSphere Virtual SAN, which is now simply called vSAN. There are other vendors, such as Scale Computing and Storidge, but VMware remains the dominant player.
The VMware product is architected to be a part of vSphere and operates as a feature of vSphere and works with all vSphere virtual machines and virtual desktops. The vSAN software runs in the ESXi kernel, which means it's not a virtual storage appliance and doesn't require a VM to execute.
Hypervisor-based SDS pros:
Flexibility. VSAN works with both hard disk drives (HDDs) and solid-state drives (SSDs), including DIMM-based flash drives, PCIe, SAS, SATA and even NVMe. VMware vSAN supports both HDDs and SSDs in a hybrid mode or all SSDs in all-flash mode.
Scalability and performance. VSAN is highly scalable while delivering high levels of performance. It scales out through vSphere clustering and can support up to 64 vSphere hosts per cluster. Each vSphere host supports approximately 140 TB raw storage capacity and well north of 8 PB of raw storage capacity per cluster. On the performance side, each vSAN host can supply 100,000 or more IOPS, yielding millions of IOPS per cluster.
Simplicity. VSAN is simple because it's natively integrated as part of the VMware stack. It feels and acts like all other vSphere features so it's intuitive for a vSphere administrator. The vSAN software automates storage tasks on a per-VM basis such as provisioning, snapshots/data protection, high availability, stretch clusters, disaster recovery and business continuity. Even data migration to vSAN can be accomplished relatively simply via vSphere Storage vMotion.
Total cost of ownership (TCO). Compared to legacy storage architectures, its TCO should be less. The saving comes from the difference in the price of drives (HDDs and SSDs) in a storage system compared to the same drives in a server. Those drives are typically three times more expensive in the storage system. Some of the other vSAN cost advantages come from the predictable pay-as-you go scaling, unified storage management, unified data protection, disaster recovery and business continuity; and consolidated storage networking.
Hypervisor-based SDS cons:
Flexibility issues. VSAN is a closed-loop SDS in that it only works with VMware vSphere 5.5 or better. Older ESXi implementations, other hypervisors, or physical machines don't work with vSAN. It can't be used by virtual or physical machines that are not part of the vSphere cluster. There is an element of do-it-yourself (DIY) to vSAN. For example, running on inexpensive commoditized hardware is somewhat limited to VMware's hardware compatibility list (HCL). If hardware isn't on the list, it's not supported.
Scalability and performance issues. If a VM requires more IOPS than one physical vSphere host can provide, it can get them from other nodes in the cluster, but with a considerable latency penalty. Inter-cluster storage performance is another issue. Most vSAN clusters use 10 Gbps to 40 Gbps Ethernet and TCP/IP to interconnect the hosts. This architecture essentially replaces a deterministic system bus with a non-deterministic TCP/IP network so latencies between hosts become highly variable. Unless the cluster uses more sophisticated and faster interconnections, its storage performance from one clustered host to another will be highly variable and inconsistent.
Some things are not so simple. Converting from a siloed storage environment to a pure vSAN requires converting non-VM images to VMs first. It's a time-consuming process for non-vSphere environments.
TCO issues. Until version 6.2, vSAN lacked deduplication and compression capabilities. This raised costs per usable TB considerably versus SDS products that include data reduction. In addition, making sure data and VMDKs on a specific clustered vSphere host remain available to the rest of the cluster in case that host fails currently requires multi-copy mirroring. Best practices require at least two copies of the original data, and many administrators opt for three copies. This practice eliminates the drive price advantages. And because vSAN is a vSphere exclusive option, it has its own license costs that can be substantial.
Hyper-converged infrastructure (HCI) SDS
HCI combines the server, storage, networking and hypervisor, and packages them into clustered nodes. HCI is designed to eliminate do-it-yourself integration headaches, expensive server hardware, the need to over-provision storage, high availability issues, complex storage management and hardware compatibility issues. There are many HCI options from server and other vendors, including: Atlantis, Cisco, Dell EMC, Fujitsu, Hitachi, HPE, HyperGrid, IBM, Lenovo, Maxta, NEC, Newisys, Nutanix, Pivot3, Quanta, Scale Computing, SimpliVity, StarWind, StorMagic and SuperMicro.
Hyper-converged infrastructure (HCI) SDS pros:
Flexibility. As with VMware vSAN, a VM administrator can control the storage. In fact, several HCI implementations are based on VMware vSphere and vSAN including VMware's EVO:RAIL reference design. Several HCI vendors offer a choice of hypervisors, including vSphere, Hyper-V, KVM or XenServer, and some have "bare metal" offerings available with Linux using Docker containers without requiring a hypervisor. Many HCI implementations allow different capacity-sized nodes within the cluster. A few are software-only such as Atlantis, Maxta, StarWind and StorMagic. Maxta partners with most of the main server vendors including Dell EMC, Quanta and SuperMicro.
Scalability and performance. Scaling HCI is as simple as adding a node to the cluster. Scaling storage capacity just requires adding drives (HDDs or SSDs) up to a node's maximum or adding additional nodes. Each HCI product has its own scalability and performance limitations; however, most scale well into the PBs and add performance linearly with each server node added to the cluster.
Simplicity. Plug it in, turn it on, configure and you're done. Few systems are simpler. No DIY, and there's just one throat to choke for support.
Total cost of ownership (TCO). Similar to VMware vSAN. Many HCI vendors include inline deduplication and/or compression that can reduce total capacity requirements and TCO significantly.
Hyper-converged infrastructure (HCI) SDS cons:
Flexibility issues. HCI are closed-loop SDS systems so their storage only works with the server nodes in the cluster. Any physical or virtual host not in the HCI cluster cannot access the HCI storage. (There are exceptions, such as Nutanix and Pivot3.) Cluster hardware is limited to what the HCI vendor provides or hardware certified by the software-only HCIs. As with VMware vSAN, there is vendor lock-in and replacing the vendor requires migrating everything from the old HCI to the new, which can be time-consuming and tedious.
Scalability and performance issues. HCI cluster capacity is limited by the number of nodes supported in the cluster and the amount of capacity supported per node. If a VM requires more IOPS than a given host can provide, it can get IOPS from other nodes, but with a considerable latency penalty. Inter-cluster storage performance is another issue. Most HCI clusters use 10 Gbps to 40 Gbps Ethernet and TCP/IP to interconnect the hosts so latencies between hosts can be highly variable.
Some things are not so simple. Converting from a siloed storage environment to an HCI cluster requires first converting both non-VM images and VMs to the HCI VMs or Docker containers, a time-consuming process.
TCO issues. HCIs -- like vSAN -- have an issue in making sure data, VM images, virtual desktop images and Docker container images on a specific HCI node remain available to the rest of the cluster should that node fail. Today, that requires multi-copy mirroring. Best practices require minimally two copies of the original data and more commonly three copies, which increases total capacity requirements and related costs.
Storage virtualization SDS
Storage virtualization SDS is the most mature variation of SDS in the software-defined storage market. It's been around since the early 2000s, when it was just called storage virtualization. Storage virtualization SDS is primarily the entire storage software stack including all storage services, optimized to run in the x86 architecture and convert hosts into powerful full-featured storage controllers. It virtualizes server storage and external storage systems to create a single or multiple virtual storage pools with different capacities, data protection policies and performance characteristics. Storage virtualization SDS essentially converts x86 servers into storage systems; some products can also run as VMs as a virtual storage appliance (VSA). Storage virtualization SDS is primarily scale-up architecture, but some products can scale-out as well. They're architected to eliminate costly proprietary hardware, take advantage of lower-cost server drives, repurpose older storage systems and simplify data migration. Some of the better-known players/products include: DataCore Software SANSymphony, Datrium, Dell EMC VIPR, ElastiFile, FalconStor, Formation Data Systems, Hitachi Data Systems (HDS), IBM SAN Volume Controller (SVC), ioFabric, Microsoft Windows 2012 R2 (and higher), NetApp Ontap Cloud, Nexenta Systems NexentaStor, QuantaStor, Starwind Software and StorONE.
Storage virtualization SDS pros:
Flexibility. It works with most x86 physical hosts or VMs as long as the hardware or hypervisor is certified and supported by the vendor. It converts all storage that sits behind it into the virtual storage pool, enabling repurposing of older storage. The scale-out versions permit physical or VM access to any node. Multi-copy mirroring isn't necessary to protect against a single controller failure, although it's available. Storage virtualization SDS can be provided as software or bundled with server hardware similar to HCI.
Scalability and performance. Scaling is multi-dimensional as each node in the cluster can scale-up and more nodes can be added to scale out. Generally, storage virtualization SDS is equivalent to most active-active siloed storage systems.
Simplicity. When bundled with hardware, storage virtualization SDS is a very simple storage system. It leverages commodity off-the-shelf hardware, has better scalability and in some cases provides block (SAN), file (NAS) or object.
Total cost of ownership (TCO). The biggest cost savings in storage virtualization SDS comes from commodity hardware and server-based drives. Another cost saving comes from inline data reduction technologies. Compared to equivalent storage systems, most storage virtualization SDS will yield a much more favorable TCO.
Storage virtualization SDS cons:
Flexibility issues. Most storage virtualization SDS can only run on the specific commodity hardware certified and supported by the vendor. Products that can run as VSAs require hypervisors certified and supported by the vendor.
Scalability and performance issues. On paper, these systems support tremendous capacity scalability, but the pragmatic approach is a bit different. Storage virtualization SDS capacity is constrained by x86 server limitations. Each server can handle only so much capacity before performance declines below acceptable levels. Storage virtualization SDS scale-out is constrained by clustering because the number of storage controller nodes supported is limited. Performance may also be constrained by the same limitations.
Some things are not so simple. Storage virtualization SDS is primarily DIY system integration which requires testing, QA and efforts to make sure the software works correctly with the hardware. Implementation may require professional services or a systems integrator.
TCO issues. Licensing can be a bit pricey depending on the vendor. And not all storage virtualization SDS products provide inline deduplication and/or compression. These issues can have a deleterious impact on TCO.
Scale-out object and/or file SDS
Recently we've seen the introduction of scale-out object SDS. Object storage manages data as objects, which contain the data, metadata and a unique identifier. Scale-out object SDS vendors include Caringo, Cloudian, Dell EMC, Fujitsu, HDS, IBM, Lenovo, NetApp, Quantum, Samsung, Scality and Western Digital.
There are also open source object storage variations in OpenStack Swift and Ceph. Commercial vendors offering distributed and supported products based on open source SDS software include Aquari, Huawei, RedHat, SUSE and SwiftStack.
Scale-out file SDS is a highly scalable NAS often with special characteristics such as object storage resilience or unique metadata. Scale-out file SDS vendors include Caringo, Dell EMC, NetApp, OpenNAS and Qumolo. Some scale-out file SDS products actually sit atop object storage (Exablox), and others are essentially clustered scale-out implementations of IBM's General Parallel File System (IBM Spectrum Scale). Vendors with products based on GPFS include Cray, DataDirect Networks, HPE and Newisys.
Scale-out object and/or file SDS pros:
Flexibility. Both scale-out SDS architectures are designed from the ground up for x86 servers. Some can be implemented as software on hardware certified by the vendor while others are bundled with server hardware. They are not designed to be VSAs and typically are intended for secondary or non-mission critical applications.
Many scale-out object or file SDS products can act as Hadoop Distributed File System (HDFS) storage. That can significantly lower the cost of HDFS storage by reducing the number of mirrored copies required and allowing re-purposing NFS or SMB data.
Scalability and performance. Scaling is multi-dimensional: each node can be scaled individually and generally the cluster itself can add nodes for capacity or performance. Performance for both will never approach that of high-performance block storage.
Simplicity. When bundled with hardware, scale-out object or file storage is very simple to set up, configure, and manage. Implementing it as software requires DIY systems integration. Both types leverage commodity hardware, have exceptional scalability and -- in the case of scale-out object storage -- unmatched data resilience and longevity via erasure coding.
Total cost of ownership (TCO). Both types are designed to be low-cost and offer very few add-on functions, usually licensed on an annual subscription basis. Scale-out object storage with erasure codes can lower the overall cost per GB as it requires less overhead than traditional RAID and replication data protection.
Scale-out object and/or file SDS cons:
Flexibility issues. Whether delivered as software or bundled with hardware, the hardware must be certified and supported by the vendors.
Scalability and performance issues. Scale-out file SDS generally doesn't scale as high as scale-out object storage, but object will have somewhat higher latencies. Object storage has significant additional latencies from the metadata and data resiliency functions. Both types are best suited for secondary applications where high performance is not a requirement.
Some things are not so simple. When scale-out file or object storage SDS is purchased as software, it's a DIY project, so special skills, professional services or a systems integrator may be required.
In addition, when these types of SDS are used for secondary applications such as archive, the data must be moved from its current location; some vendors have products that can do this, most rely on third-party software.
TCO issues. Data reduction -- deduplication or compression -- is only infrequently available with scale-out object SDS and scale-out file SDS. That adds to the TCO.
Software-defined storage market bottom line
SDS is a broad marketing term with a variety of software-defined storage market implementations, each with its own pluses and minuses.
Selecting the right SDS for the job requires accurate understanding of the application, storage capacity and performance requirements, the organization's skill set, and what the software-defined storage market is capable of handling.
Essential guide on SDS technology
Jon Toigo defines software-defined storage