In the shift toward Web-scale computing, key technologies such as virtualization, the move to x86 architecture...
and the rapid adoption of the DevOps methodology have transformed the IT ecosystem. As the volume of systems deployed in IT shops continues to increase, the next challenge will be orchestrating and managing compute, storage and network resources in the most efficient and effective manner, delivering services to what has become known as the private cloud.
OpenStack is an open source cloud platform project, originally started by NASA and Rackspace Hosting as a joint project in 2010. The source code is managed by the OpenStack Foundation and distributed under the Apache License, which allows free distribution and modification of the code, subject to retaining the original copyright notices. OpenStack has gained popularity as a platform for deploying scale-out applications; it is used by many service providers to deliver public clouds and by large organizations looking to implement a private cloud infrastructure. It's important to point out that OpenStack is designed to work with scale-out applications and isn't particularly well suited to deployments of traditional monolithic applications such as Microsoft Exchange or those built on databases such as Oracle.
The OpenStack software includes many different modules addressing the various aspects of a cloud environment:
- Swift: Object storage
- Cinder: Block storage
- Nova: Virtual machines (VMs)/compute
- Neutron: Networking
- Horizon: Dashboard
- Keystone: Identity services
- Glance: Image Service
- Ceilometer: Telemetry
- Heat: Orchestration
- Trove: Database as a Service (DBaaS)
With each release of the OpenStack code (currently the ninth, called Icehouse), new projects are created or “forked” from existing projects or started as new ones, including Ironic for Bare Metal Provisioning and Sahara for Elastic MapReduce, which is due in the Juno release of OpenStack.
Data services are provided by five of the components. Swift is the sub-project that delivers object storage for OpenStack infrastructure. Block storage is provided by Cinder using standard IP storage protocols like iSCSI and NFS. Glance provides a repository for VM images using the underlying storage from a basic file system or Swift. Trove provides DBaaS capabilities, while Sahara will deliver Elastic MapReduce capabilities, otherwise known as storage for Hadoop clusters. For this article, we'll focus on Cinder and Swift, the two main storage platforms.
Object and block storage
OpenStack deployments divide storage into object- and block-based systems. The Swift component delivers object, while Cinder delivers block. Both platforms can be implemented using commodity hardware or integration through traditional vendor arrays.
Cinder for block storage
Block storage is an essential component in delivering virtual infrastructure and is the foundation to storing VM images and the data used by VMs. Until the development of Cinder, which was introduced in the 2012 Folsom release of OpenStack, VMs were transient and their storage lasted only for the lifetime of that virtual machine. Cinder provides the support for managing block storage. This is presented to the compute (Nova) layer using iSCSI, Fibre Channel or NFS protocols, as well as a number of proprietary protocols that deliver back-end connectivity.
The Cinder interface provides a number of standard functions that allow for the creation and attaching of block devices to VMs, such as “create volume,” “delete volume” and “attach volume.” More advanced functions support the ability to extend volumes, take snapshots and create clones from a VM image.
Many vendors provide Cinder block support with their existing hardware platforms through the use of a Cinder driver that translates the Cinder API into commands on the vendors' particular hardware. Vendors that offer Cinder support include EMC (with VMAX and VNX), Hewlett-Packard (3PAR StoreServ and StoreVirtual), Hitachi Data Systems, IBM (across all storage platforms), NetApp, Pure Storage and SolidFire. There are also software-based solutions from the likes of EMC (ScaleIO) and Nexenta.
In addition, many software storage implementations, including open source platforms, can be used to provide Cinder support; these include Red Hat with Ceph and GlusterFS. Ceph has been integrated into the Linux kernel, making it one of the easiest ways to provide block storage to an OpenStack deployment.
NFS support was introduced with the seventh release of OpenStack in 2013, known as Grizzly, although “experimental” support was available previously with Folsom. With NFS, VM volumes are treated as individual files in a similar way to the method used within the VMware ESXi hypervisor or VHDs on Microsoft Hyper-V. Encapsulation of VM volumes as files allows the implementation of features such as snapshots and cloning.
Storage features have been introduced into Cinder with successive releases and subsequently have been supported by storage vendors. A comprehensive list of vendor platforms and features supported can be found on the OpenStack Wiki page covering OpenStack Block Storage Drivers.
OpenStack's support network
There are now more than 200 companies involved in the OpenStack project, providing funds and personnel to develop code in their respective areas of interest. Traditional vendors are also starting to develop and release their own versions of OpenStack, such as Hewlett-Packard with its Helion software. Other vendors are offering software to support OpenStack deployments or the Cinder and Swift components with their hardware offerings.
Swift supports object storage
Object storage within OpenStack is delivered via Swift, which implements a scale-out object store distributed across the nodes of an OpenStack cluster. Object stores store data as binary objects, with no specific reference to a format. Objects are stored and retrieved from Swift using simple commands such as PUT or GET, based on the HTTP (Web) protocol, also known as a RESTful API.
The Swift architecture is divided into a number of logical services, including object servers, proxy servers, container servers and account servers, which together are classified as a ring. Data is stored on object servers with other components used to track metadata relating to each stored object and to manage data access.
Data resiliency is managed within Swift using the concept of zones. A zone represents the subcomponent of a ring used to provide one copy of data, with multiple zones used to store redundant copies of data known as replicas (with a minimum of three as the default). Swift can use a single disk drive or server to represent a zone, including geographic dispersal of data between data centers.
In common with many object stores, Swift uses the idea of eventual consistency to implement data resiliency. This means data isn't replicated in a synchronous nature across an OpenStack cluster as could be done using block storage. Instead, data is replicated between zones as a background task, which may be suspended or fail if systems are under high load.
Compared to block storage, where synchronous replication is a feature used to provide a high level of availability, eventual consistency may seem to be more risky. However, there's a tradeoff to be made among scalability, performance and resiliency. Eventual consistency allows an archive to scale much more easily than a block storage-based system; in the case of Swift, the proxy servers ensure the most recent copy of data is retrieved even if some servers in the cluster are inaccessible.
As with all the OpenStack projects, Swift continues to be developed with new features and enhancements with each version released. OpenStack Grizzly introduced more granular replica controls, allowing rings to have adjustable replica counts. Object read performance was also improved through the idea of timing-based sorting for object servers. This allows data to be delivered by the fastest responding object server and is important for scaling over wider area networks.
As Swift uses the HTTP protocol, it would be perfectly practical to use third-party storage solutions for object storage within OpenStack, including products from Cleversafe, Scality or public clouds such as Amazon Web Services Simple Storage Service (S3).
Swift or Cinder? Make the right choice
It's obvious that Swift and Cinder provide for very different types of data requirements. Object storage (delivered through Swift) was designed for highly scalable stores of object-based data such as media, images and files. The focus for these systems is their ability to scale to large quantities of data without the dependence on traditional storage features such as RAID. However, their eventual consistency model means Swift isn't suitable for storing data such as virtual machines.
Although Swift uses metadata to track objects and their versions, object stores still require additional logic to track user metadata on the objects being stored. This would need to be built into applications by users.
Cinder delivers the block storage component used to store persistent objects such as VMs and data that's regularly updated in place like databases. Block storage features can be implemented across an OpenStack cluster with commodity components using built-in tools such as server Logical Volume Managers or NFS to deliver storage resources. Alternatively, open source solutions such as Ceph and GlusterFS provide the ability to package the delivery of OpenStack storage separately from the main OpenStack code, while still retaining the flexibility to use open source software.
With widespread Cinder support, existing traditional storage solutions can be used to provide storage services into an OpenStack deployment. This may be preferable when an IT group already has the skills and hardware platforms in place. Existing storage platforms are well developed and already support advanced features for storage optimization such as thin provisioning, data deduplication and compression. Many now offer quality of service (such as platforms from HP 3PAR StoreServ and SolidFire), making them suitable for use in mixed workloads rather than purely dedicated to an OpenStack deployment. As a result, there's still a significant benefit in offloading the “heavy lifting” tasks to an external storage array.
In making the decision to use a particular platform, system architects need to weigh the risks vs. the cost of using OpenStack “free” solutions (that still need hardware) or benefiting from features offered within dedicated hardware.
Backing up OpenStack storage
As a final thought, we should consider the need to back up data in OpenStack. The details on backing up the critical configuration components of the OpenStack environment are well documented; however, backup of data within an OpenStack cluster is seen as the responsibility of the user. Backup could be implemented easily through the use of an external storage provider; SolidFire, for instance, offers the ability to back up an entire cluster to an Amazon S3 or Swift-compatible object store. Alternatively, users will need to look at existing backup products that support their OpenStack hypervisor.
Raksha is a new project proposal that will integrate backup-as-a-service functionality into the OpenStack framework. This would include both full and incremental backups of VMs to a Swift “endpoint” with the ability to be application consistent. Raksha is currently a standalone project and not part of the core OpenStack distribution. It will need some significant work to integrate into common hypervisor platforms such as vSphere and Hyper-V, but could provide a more integrated solution to delivering data protection within OpenStack environments.