Manage Learn to apply best practices and optimize your operations.

Data classification best practices for tiered storage

Classifying data for tiered storage may sound simple, but it's not. Learn best practices for effective tiered storage, including capacity planning and data archiving technology.

The concept behind tiered storage is simple: Divide data according to its value, access needs and retention requirements, and then handle each class of data accordingly. While this may sound simple, it may not be so easy to accomplish.

According to the Storage Networking Industry Association (SNIA), the hardest part of implementing a tiered storage system is classifying the data.

"The biggest challenge we hear from customers as they begin implementing ILM-based tiered storage is in getting agreement on the information and data classification requirements," reads a quote on the SNIA home page. "This is the crux of establishing successful ILM practices."

Information lifecycle management or ILM is basically the goal-directed adjunct to tiered storage. While tiered storage plans deal with how the data is to be divided and stored, ILM deals with the same issue from the user's perspective. In other words, why the data needs to be divided and stored in that manner.

More on tiered storage
Tutorial: Creating a tiered SAN architecture 

How to purchase a tiered storage tool 

Using SAS and SATA for tiered storage

A successful tiered storage implementation has to balance the cost of storage, the life expectancy of data, how often data will be retrieved, and how fast it will be needed against cost and available technology.

There are only three basic tiers of storage -- archival, backup and immediate use -- but a typical business is likely to end up with more than a dozen categories in their tiered storage system. Because the categories have to be composed for specific businesses, there are literally hundreds of rules, often differing in only tiny details.

One variable, for example, is the amount of time a record is retained in each tier and if the record should be moved to the next tier or destroyed.

However, establishing categories for a tiered storage system isn't primarily a storage management problem. It's also a user problem, and each group of users in the business needs to help in deciding the classifications. Storage managers typically get the job of refining the categories and picking the technologies to support them.


User input on data is necessary

You'll need input from each group of users regarding the data they generate or handle. This will include how long to keep the data, how fast the data needs to be available when needed, how likely it is to be needed, when it can be moved between tiers, and when and if it can be destroyed.

Setting up a data classification committee will likely be necessary because data is typically used by more than one department. For example, data that's useless after three months to one department may be useful to another department for a few years.

Whenever possible, the classification itself should be done automatically. That is, the system should be able to determine where to pigeonhole each document without asking the user to classify it. This is usually done based on the type of document (spreadsheet or word processing), time of creation, who created it and what folder it's stored in. This means the classifications need to be easy enough for the system to handle.

The next step is to rationalize these categories and combine them when possible. This involves questioning users about their classification characteristics. For example, a user may only need a document for three months, but if retaining it for six months eliminates a category and doesn't cause problems, it might be worth doing.

Now it's time to start thinking about technology. How much of the hardware and software that you have now can you use in your tiered storage system? What new technology will you need? Can you slipstream it in or will you have to install it in large chunks?


Capacity planning becomes more complicated with tiering

One result of tiering is that capacity planning becomes more complicated. Instead of just needing more hard drives, you need to decide which kinds of hard drives you need (i.e. fast SCSI, medium speed SCSI, SATA, RAID 10, RAID 5, etc.). Don't assume that the storage device categories will grow in lockstep. Tiered storage typically has some categories that grow rapidly, some that hardly grow at all and others that will actually shrink.

For example, archival needs often shrink as data that was once stored permanently is reclassified into categories that are kept for only a limited time.

Once the data is classified, the business rules are set and the technology is in place, dividing the data can be a straightforward process with storage management software and, if needed, data archiving programs.

Some types of data in tiered storage are best handled by specialized data archiving programs, if the amount of data is large enough. This is particularly true with email because of the nature of the backups created by email programs (archival rather than easily searchable) and the large number of small data files.

Once a tiered storage system is established, its classifications should be rigidly adhered to. Let's say records are subpoenaed -- it's crucial that you can deliver all of the records asked for and that all the records you believe to be destroyed have actually been destroyed.

The user who's kept a forgotten copy of a supposedly destroyed email on his/her personal hard drive can be a serious impediment in a court case. You'll need a company policy establishing what kinds of information can be stored where, and you'll need to provide employee education.

Dig Deeper on Storage tiering