Published: 23 Oct 2013
The future of data storage will have storage shedding the role of a passive technology player as it integrates more closely with applications and workloads.
In a previous column, Arun Taneja wrote that the concept of LUNs is dead or at least dying as the primary way storage will be managed in the future. This has become evident with the emergence and increased adoption of products offering advanced virtual machine (VM)-centric storage. Shifting the focus from LUNs to VMs changes the storage game for VM administrators who can continue to work with constructs they understand directly, storage folks who have to elevate their service offerings, and even those pesky end users who might benefit from increased performance and availability (and hopefully lower costs).
You could view the end of the LUN as a consequence of industry commoditization of low-level array functionality as storage vendors compete to offer better, higher-level products; or you might chalk it up to a highly competitive marketplace where the most efficient and effective IT can help win the day. Either way, we think it's inevitable that storage solutions will keep evolving up the stack. The big question is what comes next. What are the next valuable levels as one climbs the storage stack? Let's start with familiar storage types and work up to some possible future storage solutions.
Block, file and object storage
I'm going to oversimplify a bit here, but at the foundation we might find our beloved block storage. Block storage is about bit handling -- storing and protecting raw data. At scale, responsibility and focus separate often splitting into IT domains or silos. A storage manager takes care of storing and protecting whatever data is put into arbitrary containers (LUNs), while the storage client is free to manipulate and arrange the data in any way needed. The SAN effectively offers disk virtualization. Performance can be high, but the service is basic and the client needs to deal with many low-level issues. Handoff at the LUN level ensures specific storage is defined and allocated, but there's little ability for the storage manager to optimize any data-level services for his clients.
File systems add a layer of utility to block storage. Instead of raw bit storage, the storage service delivers a "virtual" file system to its clients and keeps track of predetermined metadata about the files put into the system. The storage manager can create, optimize and tune this file service to the client's benefit while attempting to optimize underlying infrastructure resources. At the same time, the client enjoys higher-level file services while relinquishing lower-level control. Fundamentally, a big burden has been shifted from "many" clients back to a more efficient central storage service.
File systems are great for applications that don't have extensive abilities to organize raw disk, but they still present a mainly human-oriented interface of directory hierarchies, ownerships, permissions and sharing facilities. Object storage is a more natural persistence target for automation and programming with its simplified protocols to read and write arbitrary chunks of data and metadata. Object stores are famous for being able to independently manage their stored objects using metadata with policies, for example, to modify data protection levels as objects age, ensure geo-location compliance and even delete objects past a retention period.
While most object stores are built internally on file system components, it can be argued that file systems can be built over object storage. However, we still think the evolution of object storage that provides for in-storage data management (e.g., automated lifecycle management) is effectively a step up the storage services stack.
As I've noted, one way up the stack from object storage is to engineer storage solutions for specific application data objects such as a VM or database. To avoid confusion with the well-used term object, I'll call these things application "constructs."
Storage intentionally designed for specific application constructs can not only be highly functional for client needs, but internally optimized to offer significant performance and cost/capacity advantages. For example, the Oracle ZFS Storage Appliance is "application engineered" with the Oracle Database. It can store and unilaterally apply storage-side processes to database data (e.g., in the Hybrid Columnar Compression format). It also supports a specific protocol that enables the database to directly tune storage-side parameters affecting application performance.
In Arun's column, he noted that many VM-centric solutions, such as Tintri, work on VMs as the primary construct. Going a bit further out, we might consider Atlantis ILIO as a focused storage service for virtual desktop infrastructure desktops. Actifio could be defined as a storage service aimed at managing "copy data" constructs. We might even think of something like Maginatics' MagFS as delivering "file system" constructs where the primary client is provisioning file systems ("file systems as a service"), rather than end users accessing files.
In all those cases, the storage solution is application-aware. It has intimate knowledge of the application construct and can provide improved management, performance and efficiencies over application-blind storage.
Which way do we go?
Looking farther out, I can see the next evolution where storage doesn't embed just an application construct, its metadata and static policies, but specific dynamic behaviors, programs or functions. Think of database-stored procedures or object-oriented programming where methods can be "attached" to individual program objects. Certainly, storage infrastructure is becoming more loaded with compute and memory power these days, and soon there could be abundant storage-side capacity to execute functions embedded in the data. In the future, data at rest might not be so easy to separate from the dynamic application, and might not actually ever come to rest.
At a larger scale, many vendors have toyed with running VMs in their storage arrays to provide more direct access to data or to run things like antivirus scans. If you think about virtual storage arrays that already run storage as a VM, the trend of converging compute and storage seems inevitable.
Please define software-defined storage
We have three storage vendor approaches to future infrastructure. The first is converging unified storage. Some vendors may continue to accrete broad functionality onto a large core platform. The second approach is to create application-specific storage, designed and optimized for a specific application. We see many startups with focused visions in this category. And the third is programmable or software-definable storage that can be dynamically shaped as needed. Despite announcements by EMC ViPR and others, this category remains to be proven effective.
Which approach will win out? I think the next-generation storage platform will have a powerful, general-purpose, scale-out core with application-specific templates available to dynamically program it to support various application constructs. This platform will have plenty of horsepower for data-centric compute tasks, and may look like a virtualized big data cluster when seen from above.
Inevitably storage will move up the stack and get more intelligent and closer to the applications that use it.
About the author:
Mike Matchett is a senior analyst and consultant at Taneja Group.
Copy data storage should be a vital part of your storage strategy