beawolf - Fotolia
Open source software has been a serious force for good in driving forward a collaborative, community-based software...
development model. The most obvious example of this is the development of Linux, various distributions of which have been adopted as the cloud operating system of choice and the go-to platform for modern application developers.
Higher up the stack, we see the same ethos applied to containerization in the form of Docker and a host of database platforms for structured SQL, NoSQL and analytics uses. But what about storage? Has the evolution of open software development passed storage by, or are there options available for those who want to implement open source in storage?
Open source technology defined
First, let's explain what we mean by open source. By definition, it implies that a product's source code is freely open to be accessed and read by anyone. The scope is much wider than that, however, with most open source technology projects making code available under the terms of a license. The license determines how code can be used or reused, what attributions must be made, how patents are covered and what commercial use is permitted.
Typically, licensing schemes such as GNU's Not Unix General Public License (GNU GPL) apply copyright rules to code developed for a project that requires the code be freely distributed and used without the licensee placing restrictions on the code they produce, otherwise known as copylefting. Most recent additions in GPL 3.0 ensure this, and patents developed as a result of a project are made freely available for anyone to use.
What does this mean for storage software development? In reality, developing a storage platform is no different from any other piece of software. So open source makes a lot of sense for storage because it allows large-scale collaboration on a complex issue -- maintaining a 100% guarantee of the accuracy of data in a persistent model.
Open source has been used successfully as a model for developing operating systems and databases. It makes perfect sense to see the community development model applied to shared and persistent storage requirements.
Storage hasn't been the most obvious choice for open source development because most early shared storage platforms were developed on proprietary hardware. However, the commoditization of servers and storage media has grown to a point over the last 15 years where the cost is low enough and the reliability is high enough to build storage platforms from off-the-shelf components. With this rise in software-defined storage, open source storage has become just one aspect of a market composed of many commercial SDS products.
Why open source storage
As an end user, what reasons do you have for going with open source storage technology? Because, as with commercial SDS, open source storage separates the purchase of hardware from software. This allows you to source, build and design your hardware in order to gain cost and operational advantages, such as minimizing the number of hardware platforms you must support. Proprietary storage vendors typically put a large markup on the hardware they sell, for instance. When hardware components were bespoke, this was understandable. In today's commodity world, however, the markup isn't so acceptable except to cover costs of testing and validating configurations.
Open source storage platforms go a step further, eliminating storage software capital expenditure. All that remains is to decide whether and how to pay for support. In fact, getting support from a vendor or value-added reseller is the main problem most enterprises have to face when using open source storage software.
Thankfully, support models exist. Red Hat, for example, has a thriving business supporting its own Red Hat Enterprise Linux (RHEL), which is available commercially and derived from the Fedora distribution. RHEL is, in turn, available as open source technology in distributions such as CentOS.
Wisdom of the crowd
Using open source storage provides access to a wide range of developers testing on many different hardware platforms. In this case, the "wisdom of the crowd" can help test and debug many hardware corner-case problems.
Running open source storage provides for the same level of flexibility as standard commercial storage products. You can run commercially supported versions of open source storage systems on production environments. That way, testing and development can run with in-house supported storage deployments. This approach offers significant cost savings, especially with unstructured data that requires the likes of scale-out object storage.
Choosing a product
A range of open source technology on the market covers object-, file- and block-based storage requirements. Some products work with one protocol; others support multiple protocols through either emulation or protocol connectors.
The most common open source storage offerings fall under the category of object storage, typically used to store archive or backup data where the costs must be low.
- Ceph is an open source technology project started around 2007 and developed from a doctoral dissertation written by Sage Weil. Like most open source projects, it's available on GitHub and licensed under Lesser GNU General Public License (LGPL), version 2.1. Ceph is a scale-out, distributed object store known as a Reliable Autonomic Distributed Object Store (RADOS), built from multiple physical or virtual nodes that provide storage, metadata services, API services and cluster monitoring. In addition to object, Ceph supports block and file data, the former through RADOS block device and the latter using a Ceph FS, a file system gateway. In 2014, Red Hat acquired Inktank, the company providing support for Ceph, and now sells a commercial version of Ceph that provides a more robust and enterprise-level implementation.
- OpenIO is a French company that's developing a scale-out object store to support a range of application uses, from email to backup and archiving. Parts of the software are licensed under LGPL, version 3, and others under Affero General Public License, version 3. Although OpenIO had been in development since 2006, it only became open source in 2012. Unlike most open source storage offerings, OpenIO supports both x86 and ARM processor architectures that you also can mix within a single cluster.
- Minio is an object store server licensed under Apache License, version 2.0. The software is lightweight and can run either as a Docker container; on macOS, using Homebrew; or under Windows or Linux, both x86 and ARM. Minio relies on community rather than commercial support.
- S3 Server was released by Scality in 2016 as a Docker container image. The software has been pulled over 600,000 times since then. It's licensed under Apache 2.0. As a lightweight single-node object store, S3 Server delivers easy access to an Amazon Web Services Simple Storage Service API-compatible object store. Instead of S3 Server, Scality expects customers will move to its commercially supported Ring product for object store implementations for large-scale production.
- Swift is the object storage component of OpenStack. It provides a scale-out node-based object store that can be run on commodity servers. Swift is also a protocol used to access data and is supported by a range of other object storage vendors. SwiftStack provides commercial support and leads Swift development efforts.
- Lustre is a parallel file system mainly used for high-performance computing requirements. It's licensed under GPL, version 2, managed by Open Scalable File Systems and designed to run on Linux. Until May 2017, Intel commercially supported software-only Lustre deployments, but it appears to have discontinued support. This has left companies such as DataDirect Networks to provide support as part of hardware bundles.
- FreeNAS is an open source storage appliance that is more than 10 years old. Its software is based on the highly scalable, open source Zettabyte File System (ZFS). IXsystems provides commercial support for FreeNAS with a hardware appliance called TrueNAS.
- GlusterFS, or Gluster File System, is a scale-out file system that is also available from Red Hat as a commercial storage platform. The company, Gluster, originally developed and supported GlusterFS until its acquisition by Red Hat in 2011. The software is licensed under GPL, version 3. GlusterFS consolidates storage resources from multiple servers or nodes into a single, parallel file system. Contributing servers can either be storage providers, called storage bricks, or storage consumers. As a storage product, GlusterFS is simple to implement. It uses a distributed metadata architecture, making it particularly suitable for large-scale file archives.
- Cinder, as part of the OpenStack project, delivers block-level access to store persistent data for virtual instances. Cinder provides access to local storage by using logical volume manager or plug-ins that enable traditional storage to be used with OpenStack. As such, support comes from either the storage vendor or an OpenStack distribution provider.
- OpenEBS is an open source project that's developing block-based storage for containerized applications. Like many open source storage efforts, OpenEBS is written in Go and is licensed under Apache 2.0. Following a trend we see with many open source storage offerings, OpenEBS is still in beta under active development for production usage.
- Portworx is a scale-out storage product based on containers that provide it with storage. The company, Portworx, offers a commercial edition, PX-Enterprise, and a free developer version, called PX-Developer. The developer edition offers limited scalability and no GUI, but it can be used in development environments in lieu of the commercial product. An added benefit: The software can be deployed in the public cloud.
Contributors give back
Of course, open source technology is all about giving back to the community. So everyone is free to contribute to the development of the platforms we have discussed.
Scaling storage for high volumes of data can be expensive. With open source storage, IT organizations get the benefits of commodity storage, have no licensing charges and only pay where support is really needed.
For large enterprises, it may make sense to have some developers involved in writing open source storage software. That way, they get the opportunity to internally support the software -- either fully without vendor support or for dev-test purposes -- and direct the integration of new features. For long-term deployments, such as archives and backup, getting involved in maintaining an open source platform helps mitigate the risk of a vendor discontinuing a commercial offering.
Build your own
Rather than use a commercially supported open source product, another option is to build your own from open source components, such as the Linux iSCSI Target or SMB on Linux. You could use these to provide file and block services to your storage infrastructure, especially in conjunction with the ZFS file system. However, you would be without any support other than the developer community. It's not for the faint of heart.
The open source storage market offers a wide range of products and tools covering the main storage data types and many different use cases. Typically, one company develops an open source product and provides support for it while keeping the code open to the community. Larger enterprises may find it difficult to get the level of support they have with commercial storage providers. Nevertheless, over time, open source could well become a major contributor to the storage landscape.
Overcoming the open source storage challenges
Everything you need to know about OpenStack open source storage
Open source data storage options: GlusterFS vs. Ceph
- Best practices for effective information management –SearchDataManagement
- Rethink data integration for the age of big data –SearchDataManagement
- The best way to begin an enterprise information management program –SearchDataManagement
- Big Data Challenges and Pitfalls –SearchDataManagement