When setting up a tiered storage strategy, the first step is not to buy a new class of storage devices. Before focusing on the technology, you need to classify the data sets, based on the business requirements for the applications. Then define appropriate service level objectives (SLOs) and data management policies for each class. Finally, define a tiered storage architecture to support the needs of the data classes and work out a migration plan to place each data class on the appropriate storage tier.
Here are some of the process steps and metrics you should consider when setting up a tiered storage strategy:
Business requirements should cover SLOs ranging from primary storage capacity, performance and availability to site recovery time and recovery point objectives (RTO and RPO). The metrics should define retention periods for archived data, and any industry-specific compliance requirements for data integrity (such as tamper-proof storage media) or privacy protection (e.g., encryption). The data classification process generally requires collaboration with other groups such as application developers, business owners, legal and compliance staff. Corporate auditors and outside counsel may also play a role in many cases. And an independent data and storage consulting firm could help facilitate your organization's review of retention policies and compliance requirements.
Data classes should be designed to reflect the business requirements and operating environments of different applications and data sets. You may need to define a number of data classes, depending on the environment. A large enterprise might end up defining a dozen data classes, and mapping them to three storage tiers. One reason to make these distinctions, rather than simply assigning each application directly to a storage tier, is that you may decide to introduce additional storage tiers later. The data classes will help you evaluate the need for another tier, and determine the capacity requirements. Conversely, if you decide to consolidate storage tiers, the data classes provide the basis for re-assigning applications to the surviving tiers. Finally, the data class definitions will provide a logical framework for assigning each new application to a data class based on its characteristics, and mapping it to the storage tier that currently supports the data class.
Most organizations classify data first by application, and then by dataset function within the application. The art is in the grouping of the applications and data sets, to keep the number of classes reasonable. For example, the "critical OLTP production" data class might include new transaction data from several customer-facing applications. This is system-of-record data that must be protected from loss or damage due to hardware failures, operator errors, security breaches or site disasters. On the other hand, the development and test data sets for those applications might be grouped with all other development/test data in a single data class, and assigned to a less expensive storage tier. Here are a few aspects to consider during data classification:
- Identify the system-of-record applications, and target their production data sets for stronger protection than you provide for derivative data such as data warehouses.
- Establish different classes for online transaction processing applications and batch applications, since the production data sets typically have different storage performance requirements.
- Define a separate class for development and test datasets, since these generally don't need the same level of performance and DR protection that you provide for production data.
- Network file shares might be placed in a separate class, and so might e-mail databases, large collections of images, scanned documents, faxes, video streams and other datasets that have different access patterns or service level requirements.
- Consider separate classes for data accessed by incompatible operating systems or server platforms -- e.g., Unix/Windows, mainframe, Tandem and AS/400 datasets.
- Find out whether you need a separate class for records that must be saved on permanent media for regulatory compliance. Broker-dealer e-mail archives would be an obvious example.
- Backup data should be in a separate class, whether you are backing up to disk or tape or both.
- If you have archive data that's separate from the backup data, it may need different storage capabilities and service levels.
After considering these kinds of requirements, you should define a relatively small number of data classes that preserve the most important distinctions in your business and IT operations -- which will allow you to easily classify new applications and data sets when they enter the environment.
Data management policies should support the business requirements for each data class. One important ingredient is a written archival policy that documents an agreement -- among business owners, IT, legal and compliance staffs regarding what to save and for how long. This is an IT-specific and even storage-specific extension of the organization's general document retention schedule, which may have been framed to address paper and other hardcopy records. The data management policies must address the new issues that arise when creating and retaining electronic records and documents. For example, how does the organization ensure retention of required electronic documents, which may exist only on employee desktop or notebook computers? How does it retain and manage the unstructured data files that represent key business records, to meet compliance requirements and productivity needs?
Once the policies are updated, you should refresh your standard operating procedures to implement the policies.
Storage tiers are different from data classes. A storage tier is characterized by a set of service level commitments that can be delivered by a specific storage tier, in support of the data classes assigned to that tier. For example, an organization might start with three storage tiers: enterprise RAID (mirrored and replicated) for mission-critical transaction data; midrange disk or NAS for other active data; and tape for backup. An archive tier might be introduced later, with different functions and service levels, to provide cost-effective single-instance storage for reference data. When the data classes are first mapped to the available storage tiers, some data classes might get more performance or protection than they really require. As data grows -- and as processes and technologies mature -- the organization can refine the alignment of the data classes with the storage tiers.
The storage architecture for each tier may change over time, as technologies evolve. But the architectural design and change control processes must ensure that each tier continues to support the minimum requirements of the data classes it stores, at the lowest practical TCO. Going forward, we will see wider use of archiving software solutions that are tightly integrated with tiered storage capabilities. The benefits of an integrated solution stack may be significant, while we wait for mature technology and industry standards to evolve. However, as usual, you will need to decide whether those benefits are sufficient to offset the potential costs of vendor lock-in.
Data migration plans should start with a gap assessment, comparing current data placement with the optimized placement according to the defined data classes and the available storage tiers. Then, determine what can be done with existing infrastructure, and plan new storage tier deployments to support needed capacity growth and functionality. Many companies will find that they won't need to buy additional high-priced storage for quite some time, but they may need to add low-cost storage or upgrade existing storage to support features required for privacy protection or business continuance. Existing data can be moved from high-cost arrays to lower-cost storage tiers as needed, to free up space for new or growing applications that really need the capabilities of a high-end array.
Data migration plans should also provide for moving data to lower tiers as it ages, according to an ILM strategy But the first step is to get the initial data placement correct, before starting to fine-tune or automate that placement over time. Start with data classification, policies and processes -- along with cost-effective tiered storage -- and you will be well-prepared for an ILM future.
Dig Deeper on Data storage compliance and regulations
Related Q&A from Mike Casey
Have a question for an expert?
Please add a title for your question
Get answers from a TechTarget expert on whatever's puzzling you.