Let's face it: Storage is dumb today. Mostly, it is a dumping ground for data. As we produce more data, we simply buy more storage and fill it up. We don't know who is using what storage at a given point in time, which applications are hogging storage or have gone rogue, what and how much sensitive information is stored, moved or accessed by whom, and so on. Basically, we are blind to whatever is happening inside that storage array.
Am I exaggerating? Of course I am, but only to a degree. Overall, these statements are true.
Can we extract information from the storage array today? Yes, we can. But you have to use a myriad of tools from a variety of vendors and do a lot of heavy lifting to get some meaningful information out of storage. This activity is generally so cumbersome that most users simply don't do it, unless it is required by law. In such cases (compliance or governance, for instance), external software is used to pull relevant information at great expense and time.
What if all data was catalogued and indexed upon creation; analytics were built-in and in real time; data protection was an inherent part of storage; and search and discovery were an integral part of the array? This kind of awareness would be a paradigm shift and could fundamentally change how we manage, protect and use data.
Data-aware storage explained
So, why is it possible to develop data-aware storage today? The answer is simple: flash technology, virtualization, and the availability of "free" CPU cycles. In the past, the processes outlined above would have slowed down the performance of primary storage to a point where it would be useless. Now, we can build in a lot of intelligence without impacting performance or quality of service.
If implemented correctly, data-aware storage could reduce risk for non-compliance and improve governance. It could automate many of the storage management processes that are manual today. It could provide insights into how well the storage is being utilized. It could identify issues before they occur, either for compliance or capacity or performance or SLA.
Data-aware storage must offer most, if not all of the following key attributes:
Increased awareness. The system must store and understand more about the content or attributes of the data stored on the device. Examples could be enhanced metadata about quality of service, file attributes, application-aware metrics, etc. Other examples could be actually scanning the data in real time, looking for contextual patterns or keywords for security and regulatory compliance.
Real-time analytics. It is not enough for these storage systems to gather enhanced metadata without making it useful in real time. The system must provide instantaneous updates of the enhanced analytics such that administrators or policy engines can react before issues arise. Examples could be that instantaneous spikes in IOPS by Application X can be detected and mitigated immediately or Application Y can be allowed the highest QoS as needed.
Advanced data services. In addition to reporting the advanced analytics about what is being stored, the system must offer additional data services that enable better business outcomes, based on the increased awareness. Examples could be the availability of archiving functions for dormant data or balancing QoS across different application workloads.
Open and accessible APIs. The system must also offer open APIs for advanced data awareness and capabilities. Over time, natural de facto industry standard APIs will emerge for the most popular enhanced capabilities, similar to how the Amazon S3 data protocol became a standard.
The following vendors are selling data-aware storage systems today, which meet many of these standards.
Each of these companies have taken unique approaches to applying data aware methods to solve business issues while also creating business value through data analytics.
For instance, Data Gravity is more focused on the mid-market and perhaps solving a broader set of problems for such customers. Qumulo, on the other hand, is focused on solving the problems for the largest media companies (initial market) with petabytes of data. They emphasize scalability into many billions of files. Tarmin is focused on unique application specific capabilities, for example, a data-aware storage system focused on Microsoft Exchange compliance and archive.
I fully expect that each will add more data-aware capabilities as they evolve their products to meet unique customer demands.
Storage has been dumb long enough and technology is readily available to make storage smart. As exemplified by Data Gravity, Qumulo and Tarmin, data-aware storage is not only possible but already delivering benefits to customers -- reducing management costs, improving business insights and reducing risk. Data-aware storage is still in its earliest stage of development, but I believe the technology is important and worthy of consideration.
BIO: Jeff Kato is a senior storage analyst at Taneja Group with a focus on converged and hyper-converged infrastructure and primary storage.
Peeking into the future of storage
How data management can contain storage costs