Managing and protecting all enterprise data

Scanrail - Fotolia

News Stay informed about the latest enterprise technology news and product updates.

It's time to shift from data-dumb to data-aware storage

With flash and virtualization technologies becoming readily available, the time has come for storage to evolve from data dumb to data aware.

Let's face it: Storage is dumb today. Mostly, it is a dumping ground for data. As we produce more data, we simply buy more storage and fill it up. We don't know who is using what storage at a given point in time, which applications are hogging storage or have gone rogue, what and how much sensitive information is stored, moved or accessed by whom, and so on. Basically, we are blind to whatever is happening inside that storage array.

Am I exaggerating? Of course I am, but only to a degree. Overall, these statements are true.

Can we extract information from the storage array today? Yes, we can. But you have to use a myriad of tools from a variety of vendors and do a lot of heavy lifting to get some meaningful information out of storage. This activity is generally so cumbersome that most users simply don't do it, unless it is required by law. In such cases (compliance or governance, for instance), external software is used to pull relevant information at great expense and time.

What if all data was catalogued and indexed upon creation; analytics were built-in and in real time; data protection was an inherent part of storage; and search and discovery were an integral part of the array? This kind of awareness would be a paradigm shift and could fundamentally change how we manage, protect and use data.

Data-aware storage explained

So, why is it possible to develop data-aware storage today? The answer is simple: flash technology, virtualization, and the availability of "free" CPU cycles. In the past, the processes outlined above would have slowed down the performance of primary storage to a point where it would be useless. Now, we can build in a lot of intelligence without impacting performance or quality of service.

If implemented correctly, data-aware storage could reduce risk for non-compliance and improve governance. It could automate many of the storage management processes that are manual today. It could provide insights into how well the storage is being utilized. It could identify issues before they occur, either for compliance or capacity or performance or SLA.

Why is it possible to develop data-aware storage today? The answer is simple: flash technology, virtualization, and the availability of 'free' CPU cycles.

Data-aware storage must offer most, if not all of the following key attributes:

Increased awareness. The system must store and understand more about the content or attributes of the data stored on the device. Examples could be enhanced metadata about quality of service, file attributes, application-aware metrics, etc. Other examples could be actually scanning the data in real time, looking for contextual patterns or keywords for security and regulatory compliance.

Real-time analytics. It is not enough for these storage systems to gather enhanced metadata without making it useful in real time. The system must provide instantaneous updates of the enhanced analytics such that administrators or policy engines can react before issues arise. Examples could be that instantaneous spikes in IOPS by Application X can be detected and mitigated immediately or Application Y can be allowed the highest QoS as needed.

Data-aware storage is still in its earliest stage of development, but I believe the technology is important and worthy of consideration.

Advanced data services. In addition to reporting the advanced analytics about what is being stored, the system must offer additional data services that enable better business outcomes, based on the increased awareness. Examples could be the availability of archiving functions for dormant data or balancing QoS across different application workloads.

Open and accessible APIs. The system must also offer open APIs for advanced data awareness and capabilities. Over time, natural de facto industry standard APIs will emerge for the most popular enhanced capabilities, similar to how the Amazon S3 data protocol became a standard.

The following vendors are selling data-aware storage systems today, which meet many of these standards.

Vendors selling data-aware storage systems

Each of these companies have taken unique approaches to applying data aware methods to solve business issues while also creating business value through data analytics.

For instance, Data Gravity is more focused on the mid-market and perhaps solving a broader set of problems for such customers. Qumulo, on the other hand, is focused on solving the problems for the largest media companies (initial market) with petabytes of data. They emphasize scalability into many billions of files. Tarmin is focused on unique application specific capabilities, for example, a data-aware storage system focused on Microsoft Exchange compliance and archive.

I fully expect that each will add more data-aware capabilities as they evolve their products to meet unique customer demands.

Storage has been dumb long enough and technology is readily available to make storage smart. As exemplified by Data Gravity, Qumulo and Tarmin, data-aware storage is not only possible but already delivering benefits to customers -- reducing management costs, improving business insights and reducing risk. Data-aware storage is still in its earliest stage of development, but I believe the technology is important and worthy of consideration.

BIO: Jeff Kato is a senior storage analyst at Taneja Group with a focus on converged and hyper-converged infrastructure and primary storage.

Article 8 of 8

Next Steps

Peeking into the future of storage

How data management can contain storage costs

Dig Deeper on Application-aware storage

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Where you do feel the data-aware storage market is headed?
Where to I sign on...? Or - face/palm - why didn't I think of that...?

I can certainly search for all the data I need, finding associations between files is
much harder. Finding similar information throughout the enterprise -
though it's often stored in close proximity - is harder yet again. And then there's idea associations.

that's exactly the kind of information I often need/want.

In this day
of smart cars that can share roadside information, even smart appliances
that know when they're low on eggs and butter, why don't my files tell
me what secrets they hold. Analysis is even easier and more valuable if I don't have to stand by in wait every time it's being gathered.

This is huge and I feel it will become the new standard.
This looks like a logical (and very welcome) advance. Now that cars can share information on the highway and refrigerators order fresh eggs when stock runs low, we need smarter storage for our most valuable document..

The near future will store data that's aware of what and where it is and all the things that are around it. Smarts like those will change everything for us. It's another opportunity for everyone to know more, work smarter, and deliver better information with sharper insights.
I think the data-aware storage market is going to be used heavily in the IoT arena. I can see where data-aware storage will prove invaluable in routing the data being sent from large sensor arrays to one storage solution and the company’s financial data to another.
How do you envision data backup fitting into these attributes? Would it be handled much the same way it is now, or would it be included in the advanced data services category along with archiving of dormant data?
Your are correct, in the case of this new category of storage I would put backup into the advanced data services. One of the key tenants is that data-aware storage is first and foremost a primary storage device. However, with advanced data-awareness more intelligent backup techniques could certainly be built in.
How do you balance the need for data back up to be smarter, without compromising privacy or security?  MIght it open new target vectors for exploit?

Get More Storage

Access to all of our back issues View All