michelangelus - Fotolia

DataGravity CEO Paula Long discusses data-aware storage

Paula Long discussed her second storage startup, DataGravity, which sells data-aware storage arrays that collect data and provide real-time intelligence on anomalous writes to storage.

DataGravity Inc. emerged from stealth in August of 2014 with data-aware Discovery Series 2000 unified hybrid storage arrays that integrate search, discovery and data governance. The arrays use metadata analytics to determine who has access to data, which files are being changed, when such activity occurs, where sensitive data resides and why storage capacity is getting maxed out. 

DataGravity CEO Paula Long is a familiar name in the storage business. She launched iSCSI pioneer EqualLogic, which Dell acquired in 2008 for $1.4 billion. She started DataGravity along with John Joseph, a former EqualLogic and Dell marketing executive who serves as DataGravity president. DataGravity has about 130 employees and has attracted $92 million in venture funding.

A self-professed "geek by trade," Long said DataGravity's goal is to "take the complexity of data security and data insights, and make it consumable" for data classification and protection. SearchStorage caught up with Long for a look back at DataGravity's first full year in the market, a discussion about their approach to data-aware storage and a glimpse of what lies ahead.

DataGravity is your second storage startup, following your success with EqualLogic. What's the biggest difference in enterprise storage now than when you launched EqualLogic in 2001?

Long: When we started EqualLogic, storage was viewed as a big mythical thing that only uber geeks and nerds did. It was like having an erector set. Everything was do it yourself and required you to have deep knowledge of storage. It also was very expensive. 

Paula Long, CEO, DataGravityPaula Long

At EqualLogic, we said: "Customers bought their storage, and should be able to use and manage it -- and [the vendor] ought to include new features as part of standard support." People thought that idea was heresy at the time, but look at how storage startups are innovating today. Most of them are doing self-managing and all-inclusive storage. EqualLogic was one of the first vendors to sell 100% in the channel, and most startups now are all in the channel. It's been fun to see how EqualLogic helped set the table stakes for the new generation of storage startups. 

How did DataGravity germinate from a brainstormed idea to a marketable product platform? 

Long: When we started thinking what's next [after EqualLogic], we realized people were talking about big data and data security, but not in the context of primary storage. DataGravity is about making storage easy to manage. Storage needs to take ownership and responsibility -- not only to secure the data, but also give you intelligence about the data, so you can derive value from it and cap your downside. 

We basically re-architected how storage works to gather this information. We run in an industry-standard storage array. We do have more memory and compute [than traditional arrays], and we use the second controller in the storage array to do the analytics. Our array is collecting stuff all the time. Our software does the interpretation and provides a real-time feed of write behaviors as they happen.

We think customers will gravitate to this technology to understand their content, as they need to make data-placement decisions. You need to understand something about your data before you place it somewhere. Sending data to the cloud without actually understanding it is like sending a misbehaving kid to boarding school and expecting him to behave better when he returns home. 

Data awareness is receiving quite a bit of attention. How does DataGravity define data-aware storage? 

LongYou really can't do streamlined data management without understanding the content and the people who have access to it. It's like trying to write a term paper without reading any of the books. We call it data security at the point of storage. You shouldn't have to hire experts to know what's in your data, or bring storage performance to zero to be able to interrogate it.  

People don't need another storage company trying to make their flash run faster, or [a company] that deduplicates better than someone else. What they need is visibility into their data. We give you a 360-degree view that relates the people, their content and their activities over time.  Customers get to know the demographics of their data and have a starting point for an intelligent discussion on how to manage it.  

One of our solutions architects has a quote: "The best dedupe is delete." The problem is that people don't know what they should delete, because they lack visibility into their data. We help them create data classifications for intelligent, defensible delete. 

What are DataGravity's main use cases?

Long: We need to sit in the data path to understand what's going on with storage. 

Our focus is on unstructured data that needs to be data-aware, because it's the fastest growing and most problematic area.
Paula LongCEO, DataGravity

We insert into an environment in three ways, depending on the customer. If a customer has a security project, we can be brought into [it] as a repository to watch and check the data.

Second, if a customer needs more high-end storage, they may decide to move their unstructured data to DataGravity, which is where we really shine. We're complementary to other storage. We have some customers running Nutanix and DataGravity, and [other] customers that run us alongside EMC and other vendors.  

The third insertion point is when somebody does a storage refresh, where we can replace their existing storage. We don't expect to replace all the storage out there. Our focus is on unstructured data that needs to be data-aware, because it's the fastest-growing and most problematic area. It's the unruly child.

Which type of companies use DataGravity storage? 

Long: Our channel focus is on midtier enterprises, but we're also in small departments and some larger companies as well. We recently did a study of our verticals and were surprised at how broad it was. We have customers in higher education, law firms, state and local government, small financials -- any industry that has a regulatory requirement or a rich content requirement. 

How many paying customers does DataGravity have?

Long: We don't talk about customer accounts or customer evaluations. Talking about growth is stupid. It's the rule of small numbers. How we measure ourselves is on the data stories we can tell. This is a cool product because we get to talk about different data crimes we help companies solve, which saves them money and can potentially save their reputation.  

Give us an interesting example of a customer using your data-aware storage. 

Long: We had a law firm customer that had an associate who was leaving. Before she left, she decided the content she created was hers. This lawyer started to copy all the caseload information, basically doing a data dump. With five clicks, our array is able to tell you when those events happen and which data is at risk.  

Storage should be like your immune system; it should notice when something is defective. DataGravity senses abnormal write activity [that is a sign of] ransomware, for example, and will take the equivalent of a safe backup. Because we understand the correlation  between people, content and time, we can identify the exact files being written to, when things went from unencrypted to encrypted, who caused the write spike, and roll back only those files. We take a process that might normally take a week or two down to an hour or less. 

How does behavioral-based backup differ from traditional backup?

Long: Traditional backup has a [recovery point objective and a recovery time objective], since backup tends to copy data. DataGravity doesn't have to copy any data for a backup. Our backups are instantaneous and kept on separate spindles for fault isolation. We keep a complete catalog of snapshots within the array [to support] file-level, object-level restore and image restore.

How is DataGravity storage able to provide data awareness at a granular level?

Long: Storage arrays have always known a lot of about the data, but they never exposed it to users. There is a weird taboo in the industry that storage should never look inside your data, which is weird. It's like putting someone on top of a building with binoculars to look inside a house, [as] opposed to just walking in the door to look.

Every time someone accesses data, DataGravity authenticates that person against the ability to [read or write] that data. We take that data, along with metadata, and plumb it through the data path. We mirror fine-grained writes to storage, pipe metadata to an unused resource and do full-content indexing of more than 400 file types -- 600 file types, if you include versions. This gives you a rich searchable database that answers the who, what, when, where and why of your data. 

What is on the DataGravity roadmap for 2016?

Long: We're great at defining, great at detecting and really good at defense. Our rich set of metadata means there is a lot more we can do in the areas of defense, prediction and preemption. When doing the defense, it's important to make sure we don't have any friendly fire. Maybe you haven't used the finance share at three in the morning, but that doesn't mean you shouldn't be allowed to. It may just mean we need to interrogate you a bit more about your activity.

Next Steps

The shift to data-aware storage

DataGravity keeps an eye on security with data-awareness

Add intelligence to on-site storage with data-aware arrays

Dig Deeper on Data center storage