One of the most useful features of the public cloud is the ability to store vast quantities of information at a...
consistent price ($/TB). Handing over the responsibility for managing data growth (and all the incumbent issues) would be a dream for many storage administrators. However, for various reasons, placing data off-site is not an attractive proposition for many organizations.
There are number of reasons for this, including legislation (requiring certain data to be kept in-country), risk, compliance and the cost of storing and accessing many petabytes of archive or active data in the cloud. At scale, the cloud can become expensive and moving data in and out of the cloud can also incur a significant cost.
One alternative to public cloud or traditional on-site storage is to look at building an on-premises cloud. In other words, building a storage system that meets all of the key requirements of cloud storage, while avoiding the major objections to public deployments.
As a starting point, let's review the features required of cloud computing. These are a well-defined set of parameters that include:
- Elasticity (the ability to scale up or down the resources as necessary)
- Service catalog (a clearly defined set of services, their features and cost)
- Multi-tenancy (the ability to support multiple customers with security and performance isolation)
- Management (the capability to manage resources using native tools)
- Reporting/billing (the capability to report on resource consumption)
Within typical on-premises storage environments, some of these features may already be implemented and fairly mature, whereas others may not be implemented at all.
A great place to start is in establishing a service catalog. A service catalog is a list of capabilities offered by the IT department, described as services rather than hardware terms. For on-premises cloud storage, this means capacity, performance, reliability and availability metrics. When creating the service catalog, there are a number of good reasons for avoiding discussion of the underlying hardware. For example, it allows users to accurately match the storage offerings to their business requirements. Also, it allows IT to replace the hardware supporting the service without a requirement to change the service definitions. This allows you to switch vendors, products or opt for lower-cost hardware as long as the capabilities outlined in the service catalog can still be met.
Development of a good service catalog needs conversation with storage users who may need guidance to fully understand the service offerings. Service catalogs should be reviewed regularly, to refresh existing capabilities and provide others when there is demand.
With a service catalog in place, technology can be matched to each service level. Choosing the right technology requires matching up the other requirements of cloud-based services.
Multi-tenancy describes the ability of a product to support multiple separate customers or tenants, with the logical isolation of resources from one tenant to another. Physical isolation would, of course, be easy to implement but isn't cost-efficient or scalable. Instead, the architecture of the storage platform needs to be able to deliver multi-user consistent performance, through features like Quality of Service (QoS), that address the "noisy neighbor" problem -- a situation where one user can affect the performance of another in a shared platform. Multi-tenancy also means implementing security isolation. The storage resources of one tenant should be distinct and inaccessible by any other client on the shared system.
One barrier to scaling traditional storage environments is in the manual effort required to configure, provision and make available storage for users. Hard-pressed storage administrators have spent many years developing scripting and other processes to make the job of provisioning as easy as possible. However, delivering storage cost efficiently at scale requires taking the manual process out of deployments.
Self-service allows users to request storage, either through a portal or automated via APIs. In a public storage world, customers can consume resources on demand, with no limits to the storage being requested. The surprise comes at the end of the billing period when the customer's credit card gets charged for the storage used. This post-consumption billing model doesn't work within businesses that consume resources from a central IT department because existing on-premises storage resources are finite. Storage resources, at some stage, must to be attributed to budgets, which are not infinitely scalable. This means that self-service and billing functions need to be integrated into budget workflows and authorization mechanisms to ensure any requested storage capacity can be paid for (both on internal budgets and as real hardware acquisitions).
Automation allows the unattended provisioning of (approved) resources by users. Without automation, on-premises cloud storage deployments will be difficult to manage.
From an operational perspective, implementing on-premises cloud storage means deploying a scalable, standard and transformable design. The idea of scalability and standardization are easy to understand. This typically means building storage infrastructure from components that are easy to deploy and configure in measurable building blocks. The transformable component refers to the ability to decommission and replace parts of the storage infrastructure with the minimal amount of disruption or outage.
Public storage users have become accustomed to the idea of 24x7 operations with little or no planned outages. Most IT departments are already skilled at delivering high levels of planned availability, but many will still need outages for data migration, code upgrades and replacements to other parts of the infrastructure. Eliminating these dependencies is a major challenge in designing an on-premises cloud infrastructure.
On-premises cloud fundamentals
To successfully create an on-premises cloud storage environment, it is essential to meet these three conditions:
- On-premises cloud storage needs a strong and well-defined service catalog. Internal customers need to know what they can buy, how it performs and what it will cost. Getting the service catalog right is the foundation of a good cloud storage strategy.
- On-premises cloud storage is as much about operational change as it is about technology. This means interfacing into existing processes for budgets, billing and reporting.
- At a management level, cloud storage is all about APIs. APIs allow provisioning of resources to be managed in an automated process, taking the storage administrator out of the loop and allowing systems to scale efficiently.
Build it or buy it
So is it a good idea to buy an on-premises cloud system "off the shelf" or should IT departments look to build their own infrastructure? To answer that question, it is worth discussing exactly what services private cloud storage will be used for. In today's data center, we see three main storage types -- block-based (SAN), usually for production workloads; file-based (NAS), for file storage and some production workloads like virtualization; and object storage, large-capacity highly scalable platforms for storing binary objects.
Object storage has proven popular for both on- and off-premises cloud storage due to its highly scalable nature. Data is stored in a "flat namespace" that tracks objects through a unique object reference with metadata that describes characteristics of the content. Object stores scale easily (as they are scale-out by design) and are less concerned with performance than other forms of storage. The interface to object stores is through REST-based APIs using Web-based protocols. All of these features make object stores simple to implement as an on-premises cloud infrastructure.
Traditional storage poses more of a challenge to deploy in cloud environments, where storage administrators have always managed configuration and provisioning through graphical user interfaces (GUIs). Some vendors have implemented "cloud interfaces" to their existing products, by providing API or command line interface (CLI) wrappers around the provisioning process. But, this is nothing more than "cloud washing", attempting to put a cloud spin on their products, because the APIs are not natively implemented. In most cases, these offerings fail to cater for some of the key requirements of cloud, such as QoS and multi-tenancy.
The key question to ask of on-premises cloud storage vendors is how their products can be managed either through a CLI or API for automation purposes. On-premises cloud storage will in most instances need to be integrated with virtual servers and virtual server provisioning. For storage, this means handling multiple concurrent requests to configure and map storage to the required hosts. Legacy storage products are very much "single-threaded" in this respect, handling one request at a time (and expecting requests to be batched up for execution as a single task). More modern systems have native support for APIs, allowing for multiple concurrent requests and from different sources at the same time (e.g. via an API or CLI). Good examples of products here are Pure Storage FlashArray products and SolidFire's SF series of arrays. Both support and can be fully driven from native REST APIs and, in the case of SolidFire, the API is the basis for all other management features such as the web GUI.
Storage hardware vendors don't offer the only route to building private cloud storage platforms. There is now a range of open-source storage offerings, including Ceph, GlusterFS, OpenStack Cloud, SUSE and a number of startups offering software-based storage that are deployed on the customer's own hardware. These include Maxta and Springpath. All of these platforms offer a scale-out approach with CLI/API-style management, allowing them to be integrated into a cloud storage framework. The software-defined route offers the possibility to reduce costs by removing the storage provider hardware margin. It also enables end users that need to be consistent on hardware choices to use their vendor of choice and not have to create multiple server management policies.
Of course, IT departments could decide to take on the task of developing scripting to automate the process of storage provisioning. The platform drivers for the Cinder, Swift and Manila projects of the OpenStack platform, covering block, object and file-based storage respectively are free to download from OpenStack, as well as documentation on the supported functions. If the idea of developing a provisioning framework seems a little ambitious, then there is also ViPR from EMC, a recently open-sourced tool for managing storage provisioning across heterogeneous systems.
Best of both worlds
While on-premises cloud storage provides the ability to overcome privacy and security issues associated with public cloud, there may be times when having the ability to push some data into public cloud offerings is still a desirable option. This process has been made easier through the use of cloud storage gateway products from the likes of Avere Systems, CTERA Networks, Nasuni, StorSimple (part of Microsoft) and TwinStrata (now part of EMC). Some of these are full-featured NAS appliances, providing the illusion of having all data on-site, while using public cloud as the main data repository. The advantage of using the appliance-based model in this instance is the management of data translation between NAS and object protocols that are typically used to store data in the cloud. The appliance takes care of mapping the file to object components on the public cloud infrastructure, while retaining information such as access control lists and other metadata on the file(s).
On-premises cloud storage represents as much an evolution in the practices of deploying storage on-site as it does about the technology involved. This means putting an operational framework in place covering service catalog, billing/reporting and workflow. From there, the choice of technology is diverse and available for all private storage needs.
Getting started with a private cloud project
How to evaluate cloud storage service providers
Learn about today's backup apps