Data classification trends: Classifying native applications for enterprise data storage

The latest trend in data classification is classifying data in native applications. Learn about the data classification process and get advice before starting a data classification project.

One of the latest data classification trends for enterprise data storage purposes is the opportunity to classify the data in the components of native applications, such as an Oracle Corp. database or Microsoft Corp. Exchange Server.

IT shops also can adopt the more traditional data classification strategy: handling the process via their data protection, data archiving or file system management software. Those data classification products also continue to evolve to better meet the needs of end users, according to Brian Babineau, a senior consulting analyst at Enterprise Strategy Group in Milford, Mass.

In this podcast interview, Babineau discusses the latest developments in data classification and offers advice on how to classify data for enterprise data storage purposes.

You can read a transcript of the interview below or download the MP3.

Play now:

Download for later:

Trends in data classification for enterprise storage
• Internet Explorer: Right Click > Save Target As
• Firefox: Right Click > Save Link As What are the main data classification considerations for an IT shop?

Babineau: We break down data classification opportunities into three vectors. The first is IT organizations must consider performance and availability. The second is data protection and disaster recovery (DR), and the third is retention. If we look across those three, it's pretty easy to see information accessibility. How quickly do the applications need to respond? Oracle databases and OLTP applications are likely to have more accessibility requirements than long-term archives.

With the second category, data protection and disaster recovery, likely you want to look at recovery point objectives (RPOs) and recovery time objectives (RTOs). How much data can you afford to lose, or how long can you be without an application?

And the third really comes down to what we're talking about today in terms of government and compliance. How long do I need to retain information and how accessible does it have to be over that time period? How has the data classification process changed during the last few years? What are the latest trends?

Babineau: What we're looking at right now is the opportunity that customers have in doing data classification in native applications, such as an Oracle database, with their ASM [Automatic Storage Management] solution, where they can store different partitions or classify different partitions so they can be stored on different types of devices with different characteristics.

The second new classification opportunity in the native application market that just came out is Microsoft Exchange 2010 Service Pack 1, where organizations can now save the inbox portion on one type of storage device or one class of storage device, and they can save the personal archive component of the inbox on another class of storage devices.

But really what it's giving customers is the option to do this in multiple facets. They can still do it in data protection software, where they set policies. They can still do it in data archiving, where they're setting retention policies. They can still do it in file system management software, where a file system spans multiple classes of storage. But now, the new trend we're seeing is the ability being added into native components of applications, especially databases and Exchange Server 2010. How much of data classification is largely a manual process, and what are some of the steps that can be automated?

Babineau: We actually think it's a two-component process. On the classification side, we think the manual part is getting everybody on the right page to determine the rules of where data should be placed and for how long it should be placed there, and IT can certainly serve to address the performance and availability vector. We think compliance is needed for retention. We think legal is needed for retention, records management for retention. And then there are different audit requirements for business risk, which we think aids in the data protection and disaster recovery vector discussion. So those groups need to meet and figure out what rules will drive those classifications.

The automated part is essentially the data placement and the data movement aspect. Once we take those rules and plug them into the software, whether it be a native application, a data protection solution, a DR solution or an archiving product, what we need is those rules plugged in, and then they can execute and start to do data placement and classifying of the information inside their own applications or during the data movement so that organizations get the optimal data classification and data placement on the storage side.

So the meetings have to be manual. Plug those rules in and automate it through the technology, whatever you so choose. What are the major pieces of advice you offer to IT shops undertaking a data classification project?

Babineau: I think it's a three-step process. The first thing that needs to happen is that meeting. You have to get the rules that are going to drive the classification defined, and unfortunately, it is a manual process. There are multiple constituents that have to be involved.

Then there's the [issue of] what technologies [we are] going to use to implement this. If we are going to start to implement a higher level of DR across more data types -- i.e., this becomes a common classification -- we really may want to look at a technology that optimizes bandwidth when we do our data replication.

Then, the third part would be [the] storage systems or solutions [we can] invest in that have those technologies that can optimize our classification. So, if we continue our example, we may look at a storage technology that only sends bytes of data across the wire if we're doing disaster recovery. And the same thing is [the case] if we're retaining a lot of information for a long period of time during an archive process, and those retention requirements are five, six, seven years. We may want to look at a solution that has single-instance storage that removes any duplicates, so we're not saving copies of data for that extended period of time. Now that single-instancing may be found in an archive application. It may be found in a storage system. It's just a matter of [getting] the policies right, [getting] the technologies you want and then doing your evaluation. Where do you want to execute those technologies, and where do you want to buy them?

Dig Deeper on Storage management tools