Data storage considerations for a DevOps environment
A comprehensive collection of articles, videos and more, hand-picked by our editors
The term DevOps, a contraction of development and operations, represents a new way of working to deliver enterprise...
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
applications using Agile development methodologies. DevOps transfers responsibility for some of the operational functions of IT to development teams, allowing them to create, develop, amend and deploy applications in a rapid fashion, typically without need for any interaction with the operations teams.
To deliver an Agile or DevOps environment, the way in which resources, including storage, are consumed and deployed changes to a more cloud-focused approach. DevOps development depends on the agility of the IT infrastructure to deliver resources for creating and deploying applications as needed. So developers expect certain features from a DevOps infrastructure that are different from the way the developer community worked in the past. Typically, these differences include the following:
- On-demand availability of resources: infrastructure resources available on demand for consumption when required in the development process. This may mean, for example, the ability to create a new development environment, complete with seed data, based on both container and virtual machine (VM) components.
- Automation and workflow: development environments built on demand, and for that building process to be as automated as possible. In most cases, an application development framework will be built from a master template used to deploy the application and contain the needed components for it (e.g., database server, web server, and so on).
- Scale and transience: DevOps developments will often use multiple environments to test many application changes at the same time. Each developer may want their own environment, but only need it for a short length of time. This means DevOps environments should provide the capability to spin up an application and destroy it with regular efficiency.
- Support for VMs and containers: Almost all DevOps processes rely on the development of applications within either VMs or as container instances. Storage platforms that offer native VM and container support provide an easier management and integration experience.
The use of DevOps as a methodology has introduced a range of new tools and frameworks for implementing a continuous development process. These include release management systems like Jenkins and Computer Science Corp.'s Agility; orchestration tools like Kubernetes and Mesosphere; and, of course, virtualization frameworks such as Docker, OpenStack and Vagrant. We are starting to see these platforms integrate storage in order to provide the degree of automation and security required for continuous development. Docker, for example, has extended their platform with a volume API plug-in that provides orchestration for persistent external storage. Kubernetes implements support for persistent volumes that can be provisioned from a range of sources, including traditional block and file interfaces (iSCSI, FC, NFS) or cloud and open source storage.
We should also recognize that the public cloud represents a big part in DevOps, with platforms like AWS offering the capability to create and destroy development environments very easily. Storage is typically managed by the cloud platform and not exposed to the developer. As we will discuss later, one problem with using public cloud for continuous development is in the ability to seed environments with test data.
Challenges for implementing storage within a DevOps environment parallel the issues seen with creating a private cloud. Storage resources must be provisioned on a much more dynamic basis, offering the ability to create and destroy resources on demand. For example, storage simply needs to be consumable for the orchestration and management frameworks that create a DevOps environment, such as Kubernetes or Mesosphere. This means having automation APIs capable of creating LUNs and volumes, and mapping them to the application as required.
Within application deployment frameworks such as OpenStack, the consumption of storage at a low level is achieved using plug-ins that allow vendors to expose their products to automation. The Cinder project of OpenStack covers the ability to dynamically create block-based storage and map it to an instance. There are similar projects for file (Manila) and object (Swift) storage as well. Most storage appliance and software-defined storage (SDS) vendors provide support for Cinder by offering a middleware plug-in to manage the process of orchestration. The middleware driver translates Cinder commands (like Create Volume or Delete Volume) into those for the storage platform, keeping track of these resources and their associated instances.
The continuous nature of DevOps integration means the automation of storage provisioning is essential. DevOps replaces the human element of storage workflow with automated processes.
Clearly the rate of change in a development environment is considerably higher than in production. The turnover of resources will be high and any storage platform should be capable of managing a high rate of configuration change. This can create a problem for legacy storage systems, where configurations were expected to be relatively static. Storage vendors increasingly recognize the need to drive their products "programmatically" using code rather than command line interfaces and GUIs, and so API support has become an expected feature. These APIs should be capable of processing multiple requests in parallel (even if the resource changes are internally serialized).
In general, developers aren't concerned with how their resources are delivered. The developer is concerned that their environment is available for use and working within agreed service levels. This means focusing on delivering multi-tenancy capabilities when providing storage within a DevOps environment. Multi-tenancy defines the ability to provide multiuser access to shared resources without any one user or "tenant" impacting another. Critical for DevOps environments, the multi-tenancy aspect ensures no one application environment can consume too many storage resources, either from a capacity or performance perspective. In fact, admins should limit the amount of resources consumed per environment, especially where the underlying hardware is shared with production.
Secondary storage lets you use once-static backup data as the source for seeding DevOps environments. And, by reusing existing hardware, enterprises can significantly save money while simultaneously solving the challenge of seeding development systems.
Data optimization represents one area that significantly impacts delivering an efficient development environment. The need for data optimization is clear; you build most development environments from master images or, perhaps, copies of production data. That means features like data deduplication can significantly save on storage capacity.
Developers don't care how their data gets to the test application -- they just want it available when they need it. For DevOps, it's about having storage and data on demand all the time.
Deduplication used in tandem with features like snapshots let you create many test environments quickly and efficiently. Particularly beneficial snapshots allow the cloning of VM instances with the minimum amount of overhead. Cloning can be much more practical than creating individual VM instances from scratch (and then configuring them), especially where lots of custom configurations have been applied.
Accurate application testing requires using real-world data that reflects as closely as possible the production environment. In most development scenarios, it is typical to take a regular copy or snapshot of production data and use this as the seed for testing. Data does, of course, have to be suitably anonymized to ensure customer information is adequately protected.
In a private cloud environment, creating an image copy of production can be relatively easy and achieved through the use of snapshots, clones or replication. These techniques assume the development platform is compatible with production, allowing you to move a copy of a snapshot to another platform. Alternatively, both production and development could run on the same hardware, with quality of service ensuring the right level of performance for production data.
Software-defined storage offers a great opportunity to deliver resources for DevOps environments. Products are typically cheap (or open source), run on commodity hardware, and scale out on demand and in a granular fashion.
Sourcing data into the public cloud poses more of a problem, both in the cost of storing the data and in the time taken to replicate that data into the cloud environment. Products such as Avere Systems' vFXT can run on public cloud platforms and extend access from on-premises data into the cloud while improving accessibility to development data. The advantage of these products is that they only access active data, optimizing storage and networking costs.
A word about monitoring
In a high-turnover environment where resources are created and destroyed on a regular basis, there is always the risk of storage going unused or being overconsumed. Enterprises often create development environments and then abandon or, most typically, forget about them, especially when it is easy to spin up environments on demand. At the other end, it's easy to get storage sprawl, where many development environments are created rather than reused. Monitoring and maintenance cap capacity and performance growth and identify environments no longer in use. Monitoring is also important for implementing chargeback and needs to be granular enough to work at the level at which environments are being created (e.g., daily).
The rise of DevOps has seen the emergence of new storage technologies that offer specific features appropriate for Agile development. These include the following:
- Hyper-convergence: Storage is delivered from the same physical hardware used to run applications (either in a hypervisor or as containers). The hyper-convergence management software hides the view of storage and removes the management work associated with provisioning storage to new VM instances and containers. A hyper-converged product makes the DevOps process easier because the focus is on creating logical objects like VM instances, rather than physical resource management.
- VM-aware secondary storage: The term secondary storage applies to all data stored by an enterprise for nonproduction use, including backups and archive. Storage hardware vendors have taken the opportunity to use VM backup interfaces to build systems that implement data protection to disk-based products that can be used for purposes other than backup and restore. The flexible nature of a VM image allows you to clone VMs and entire applications from backup images and run them directly from the secondary storage platform, saving on building out a separate DevOps environment.
- Software-defined storage: SDS evolved from the first platforms to separate traditional dual-controller storage software from the hardware. Today, there are lots of scale-out SDS offerings for block, file and object. Many of these are also open source, and can be deployed relatively cheaply using commodity hardware. In development environments not focused on high levels of performance, a "self-build" storage product can offer significant savings over purchasing hardware from a vendor.
Build or buy?
In summary, the requirement for storage on a DevOps environment follows the path being forged by private cloud. Storage is becoming less visible, with automation doing the work done previously by storage administrators, removing the human factor from resource consumption.
Traditional storage is probably the least appealing option for DevOps environments, with modern scale-out products offering more attractive alternatives. You can also choose to build rather than buy, which offers significant cost savings over vendor hardware. Open source products, meanwhile, can reduce the overall cost and -- with the pace of feature development -- be a good match to the DevOps mantra of continuous development.
How not to move to a DevOps process