animind - Fotolia
Small World Big Data
Published: 06 Sep 2017
The pace of change in IT is staggering. Fast growing data, cloud-scale processing and millions of new internet...
of things devices are driving us to find more efficient, reliable and scalable ways to keep up. Traditional application architectures are reaching their limits, and we're scrambling to evaluate the best new approaches for development and deployment. Fortunately, the hottest prospect -- containerization -- promises to address many, if not all, of these otherwise overwhelming challenges.
In containerized application design, each individual container hosts an isolatable, and separately scalable, processing component of a larger application web of containers. Unlike monolithic application processes of the past, large, containerized applications can consist of hundreds, if not thousands, of related containers. The apps support Agile design, development and deployment methodologies. They can scale readily in production and are ideally suited for hosting in distributed, and even hybrid, cloud infrastructure.
Unfortunately, containers weren't originally designed to implement full-stack applications or really any application that requires persistent data storage. The original idea for containers was to make it easy to create and deploy stateless microservice application layers on a large scale. Think of microservices as a form of highly agile middleware with conceptually no persistent data storage requirements to worry about.
Persistence in persisting
Because the container approach has delivered great agility, scalability, efficiency and cloud-readiness, and is lower-cost in many cases, people now want to use it for far more than microservices. Container architectures provide such a better way to build modern applications that we see many commercial software and systems vendors transitioning internal development to container form and even deploying them widely, often without explicit end-user or IT awareness. It's a good bet that most Fortune 1000 companies already host third-party production IT applications in containers, especially inside appliances, converged approaches and purpose-built infrastructure.
You might find large, containerized databases and even storage systems. Still, designing enterprise persistent storage for these applications is a challenge, as containers can come and go and migrate across distributed and hybrid infrastructure. Because data needs to be mastered, protected, regulated and governed, persistent data storage acts in many ways like an anchor, holding containers down and threatening to reduce many of their benefits.
Container architectures need three types of storage. The first is image storage. This can be provided with existing shared storage and has requirements much like platforms already built for distributing and protecting virtual machine (VM) images in server virtualization. A benefit is container images are much smaller than golden VM images because they don't duplicate operating system code. Also, running container images are immutable by design, so they can be stored and shared efficiently. There is a consequence, though, as the container image cannot store dynamic application data.
The second required data store is for container management. Again, you can readily provide this with existing storage. Whether you use Docker, Kubernetes, Tectonic, Rancher or another flavor of container management, it will need management storage for things like configuration data and logging.
It's the third type of storage, container application storage, that provides the most difficult challenge. When only supporting true microservice-style programming, container code can write directly over image directories and files. But containers use a type of layered file system that corrals all newly written data into a temporary, virtual layer. The base container image isn't modified. Once a container goes away–and containers are designed to be short-lived compared with VMs–all its temporary storage disappears with it.
If a containerized application needs to persist data, the first option is to explicitly mount a specific system data volume -- or persistent volume in Kubernetes -- into the container's namespace. This gives the container direct access to read/write into a host system directory or file share. If that container is killed and restarted, it can access any of the persisted data that it had previously written. However, this can be a tough way to share data between containers as the application developer must take care of all sharing, locking, contention and restart concerns. And it's not clear how a storage admin can discern and protect -- through snapshots, backup and disaster recovery (DR) -- thousands of programmer-controlled data volumes at scale.
In addition, if that container is brought up on a different host within a container cluster, then the storage admin has the challenge of ensuring that a shared or distributed file system -- for example, NFS–remains configured identically on all cluster hosts, and even then, application programmers will probably need to add more I/O-related code to ensure reliable cluster-level sharing. The good news is that expert storage admins can potentially bring existing enterprise storage, such as NAS and SAN, to bear in this new container world. And if they work closely with developers, they can realistically configure high-end enterprise production environments.
The best practice for the container world, however, is to enable Agile DevOps with identical sandbox, test and production environments. From the container's perspective, this approach offers end users dynamic provisioning and ensures free container movement and migration. The more static and fragile the system's storage configuration, the less you can recognize the benefits of containerization.
How containers affect storage
Containers create several challenges for existing storage, including the following:
- Dynamic provisioning of lots of containers. Anything other than cloud-scale storage can be strained by container-scale dynamic provisioning demands.
- Lost, isolated and unknown usage. Ongoing container development and DevOps can end up creating lots of fragmented islands of wasted capacity. Thin provisioning, snapshots and automatic recovery of capacity policies might help.
- Difficult to isolate contention. With so much going on -- thousands of containers dancing happily away -- resolving deadlocks and contention can be challenging, especially with legacy systems and traditional management tools.
- Data migration, keeping up with container migration and avoiding performance degradation. Containers move a lot, and their data should migrate with them to maintain top performance.
- Network issues. It's not just east-west container talk, but sharing remote storage across a cluster of thousands of containers could bring the network to a crawl. Consider affordable 40 Gigabit Ethernet and greater cluster interconnects.
To be sure, there will be more challenges. For example, the wide variety of containerized applications in your future will probably require choosing from a broader catalog of storage services than just the traditional three medal levels of bronze, silver and gold.
Docker and other container management products offer pluggable volume systems. Flocker, for example, has been a popular open source volume plug-in replacement for Docker, intelligently managing and moving data volumes with their containers across a cluster. Although the now-defunct ClusterHQ was the primary vendor sponsor of Flocker, we expect this kind of functionality to continue to evolve and become increasingly native within baseline container platforms; Rancher Labs' Convoy project is moving in this direction. Most, if not all, legacy storage vendors and cloud storage service providers produce various container system volume plug-ins for their storage arrays, and these are a good way to continue investing in storage.
Storage as software
Instead of trying to force legacy storage into new container environments, a growing alternative enlists a new wave of software-defined storage (SDS) to do the job. SDS consists of a storage operating system and services that are fully deployed as a software layer, often as VMs, but now, they're increasingly deployed as containers. It is easy to imagine containerized software storage quickly evolving to align with how containerized applications consume storage services.
While traditional production server virtualization environments often end up on clusters of large and expensive host servers, container hosting architectures can easily use a much more open, expansive and cheaper commodity server cloud made up of a more dynamic mix of private, public and hybrid infrastructure. This is somewhat analogous to how big data projects like Hadoop and Spark take advantage of commodity infrastructure and inherently use SDS and memory services to free us up from proprietary and expensive platforms.
Another key benefit of SDS, especially for distributed containerized approaches such as Ceph, Torus and GlusterFS, is it brings storage right into container clusters. While managing something like GlusterFS can be a big departure for traditional SAN administrators, containerized storage naturally gains container-world benefits like agility, scalability and availability, while increasing application performance by keeping persistent data storage local to consumption.
If this sounds too complex, pre-converged and hyper-converged container appliances make it much simpler with native container storage capabilities built in, such as Datera and Diamanti. These inherently employ SDS to gain the flexibility and agility necessary to converge everything into a platform appliance format. We haven't heard much yet from enterprises wanting to seriously invest in converged container hosting for production, but the future of IT infrastructure is going to continue down the convergence path while building out more cloud-like services.
Of course, the trick for IT folks is to judge whether paying for a vendor's unique intellectual property and value on top of free open source is worth the investment and additional lock-in. To benefit from pre-integration, proven enterprise-class features and round-the-clock support, it's often worth making a longer-term commitment to a specific vendor open source distribution or pre-converged stack. In other words, it's not just old school IT vendor vs. vendor, but variations of intellectual property-laden vendors vs. open source vendors vs. do-it-yourself.
Cloud-scale object storage
Containerized applications tend to be cloud-oriented by nature, with architectures allowing for independent scaling of different internal services depending on changing external workload profiles or growth. This same cloud approach pervades how modern application developers think of, and want to work with, storage. It's natural many new containerized applications are written for object storage I/O instead of traditional file or block.
Even if most current container environments are fairly modest in practice -- except, of course, in public clouds -- web-scale object storage from the likes of Hedvig, Qumulo and Scality align well with web-scale container visions. Amazon Web Services Simple Storage Service (S3) and similar public clouds already use object storage as the persistent storage layer when implementing or migrating container applications.
We have yet to see the ultimate in persistent data storage for containers. From past experience in storage evolution, expect to see "container-aware" storage that knows it's being provisioned and used by containers and manages itself appropriately. Much like virtual machine-aware storage, we should see a container storage service that persists container data and follows -- or even leads -- the container across a cluster and maybe even across clouds. We fully expect to eventually see container-aware caching using server-side flash and emerging persistent memory, such as nonvolatile memory express, hopefully integrated with tiers of persistence.
Expect future container-aligned storage that will be provisionable in all important requirements from within a container's manifest or application blueprint. We also hope for future overall storage management of multiple container environments that will track, predict and optimize storage subscriptions just enough to meet ongoing container operational needs. And, of course, container-aligned storage must make sure that all data protection, high availability and DR bases are covered everywhere at all times using simple policy mechanisms.
Server virtualization in the form of VMs took more than a decade to replace most application-dedicated physical servers in the common enterprise data center. Now, containerized applications appear to be set to replace many of those full virtual machine apps within a year or two. The biggest challenge is going to be how fast we can stand up enterprise-class persistent data storage for containers.
The benefits of containerized applications
Docker container persistent storage gets help from plug-ins
Persistent storage is the next container must-have
- Gartner: Reducing storage costs –ComputerWeekly.com
- General-Purpose Disk Array Vendor and Market Analysis –Hitachi Vantara Corporation
- General-Purpose Disk Array Market Analysis –Western Digital
- 5 Essential Tips for Considering a Cloud Service for Your Company –Magenic