Despite all the hype around concepts such as the internet of things and big data analytics, the practical application of such ideas has yet to truly materialize.
I know, this seems to throw shade on all those top-secret algorithms developed by Facebook or Twitter or Amazon and others that collect information on consumer behavior, locational data, catalog browsing records and so on. Leveraging that information allows the vendor to perform interesting marketing tasks, such as upsell. I get that. But to my way of thinking, the success of these applications -- measured in terms of greater brand recognition and improved sales -- isn't that important in the long term. I want more. What I want (and we'll get to this later) is copy data management.
The late Ziya Aral, former chairman and chief scientist for DataCore Software who passed away earlier this year, certainly sought to make his company prosperous with the technology he built. But he also wanted to accomplish something more meaningful, something he called computer science.
A greater purpose
Aral said virtualizing storage with DataCore Software's SANsymphony, while important to the bottom line of DataCore's profitability and customers' need to simplify heterogeneous storage infrastructure, was less important in the grander scheme of things than the goal of enabling "atomic units" of computing. He wanted to contribute to computer science by creating building blocks of infrastructure that could be rolled out in a truly agile way to facilitate an efficient IT service. His innovations around I/O acceleration (by creating a parallel I/O processing engine from idle CPU cores) made commodity servers into supercomputers and, together with storage virtualization, enabled the creation of a new kind of hyper-converged infrastructure. This new kind of hyper-converged infrastructure was based on an extended model for software-defined storage, which Lenovo and others are now beginning to productize.
Before the end, Aral's team was working to add storage resource management functionality to their kit, thereby completing the architecture for atomic units. When their work is complete, consumers will be able to buy any hardware they want, support any workload they want and manage it all efficiently at the hardware level. They will then be able to create infrastructure on the fly, managing the allocation of storage capacity, performance and services in whatever way they prefer. The status of all component parts will be constantly monitored and updated so the IT pro can manage all of the storage kit proactively, detecting and resolving issues before they create downtime or data loss. When that comes to market, the underlayment for true clouds, true SANs, true hyper-converged storage and true in-memory database platforms using inexpensive hardware and Intel architecture CPUs will be complete.
Aral, and the computer scientists who follow him, will have made a significant contribution to computing, worthy of a chapter in the textbooks.
But the job of IT will not be complete.
Turning the page
The next chapter in the story should be all about copy data management. Managing data is the core function of IT, as the internet of things (IoT) and big data folks suggest, however orthogonally.
Over and above requiring Aral's building block infrastructure, and making that architecture purposeful, is the need to manage data itself over its useful life. The elements required to realize such a data management capability are simple.
First, we need what Aral focused on: manageable storage resources with value-add services separated from hardware and both resources and services capable of allocation in a granular way to any data that require them. In addition to storage resource management and storage service management, we will also require a copy data management policy framework. This framework is a way to classify data in a global namespace regardless of the file system or object system used to store it, and to store myriad policies for handling the data throughout its useful life based on business criteria, regulatory requirements and so forth.
The system references the copy data management policy to identify what must happen to data over time. Policies are applied and appropriate storage services and resources are selected from inventories of kits that are maintained in real time. Then, data is directed to virtual storage containers that are either equipped with certain services or provide a hosting environment where services can be applied to data directly.
Considering the coming data deluge -- the unprecedented growth of data into the 10 to 60 zettabyte range by 2020 -- there will soon be too much data and too many files and objects for us to manage by hand. Indeed, in large cloud services and business data centers, manual data administration is already a herculean task, explaining why better copy data management dominated IT initiatives in just about any serious survey of IT planning intentions in 2016.
The alternative to manage the placement, preservation, protection and privacy of data by hand is to automate it. The vision of many IT planners is to use cognitive computing to do the job, while the management of data, storage services and storage resources are great IoT applications. The application of management policy to data, based on its status and the status of services and infrastructure, is a very cool application for big data analytics and artificial intelligence or cognitive computing. It is time to make it so.
Cognitive data management will build on the shoulders of guys like Ziya Aral and others who toil not just to make a quick buck, but to create real computer science as well.
Peeling back copy data management layers
CDM questions you must ask
Copy data management drawbacks