Data storage costs money, and this presents a serious dilemma for storage administrators who are turning to tiered storage in the enterprise. Keeping corporate data on Fibre Channel (FC) disks provides great performance for the network user, but the disks are small and expensive. Retaining data on SATA or serial attached SCSI (SAS) disk can be very economical, but the lower performance may not be adequate for every application. Today, IT departments are implementing tiered storage as a mix of storage technologies that meet the performance needs of databases, archives, backups and many other enterprise applications, yet keep spiraling storage costs under control.
Tiered storage approaches and strategies
Simply put, tiered storage is the implementation of two or more storage schemes with distinctive cost/performance characteristics and then reorganizing corporate data onto the most appropriate platform. The trick is to match the data's relative value to the particular tier -- placing more recent or valuable data on the faster and more reliable storage, while relegating the older, less valuable or less frequently accessed data to slower, less expensive storage.
For example, a corporate Oracle database might be stored on a high-performance FC array, such as a Symmetrix DMX-3 from EMC Corp., while files like Microsoft Word documents and Excel spreadsheets might be offloaded to a secondary array with inexpensive SATA disks, such as an Hitachi Data Systems Inc. (HDS) TagmaStore WMS system. A long-term archiving platform, like an Axion array from Avamar Technologies Inc., might constitute a third tier; a DX100 virtual tape library (VTL) system from Quantum Corp. could serve a backup role; and a NEO 4200 tape system from Overland Storage Inc. might provide backup tapes for off-site storage. The number and type of tiers depend on the individual needs of each enterprise.
Tiered storage lowers the cost of disks. Shifting less valuable data to less-expensive storage media can allow for very high storage capacities (usually measured in hundreds of gigabytes per drive) at a lower cost per gigabyte. Although tiering involves lower performance drives, many companies actually see an improvement in storage performance because network traffic is spread among several platforms rather than competing for access on a single storage system -- this enhances the storage service experience for all users.
Costs and implementation issues
While tiered storage can save money in disk space, it may not be a cheaper solution overall. SATA or SAS drives offer huge storage capacities at a low cost per gigabyte, and this in turn reduces the need for expensive FC drives. However, it's important to weigh the lower disk costs against the price of additional storage platforms and management overhead needed to maintain those platforms. Analysts report that a single tier, if managed and utilized well, can actually be less expensive altogether than multiple tiers. Consider the total cost of ownership (TCO) in any tiered storage architecture before making a purchase commitment.
It is not difficult to implement a tiered storage architecture, and there is no single "right" way to approach the endeavor. It's basically a matter of adding storage platforms and moving data to the appropriate platform. In smaller implementations, a storage array may accommodate two tiers by supporting both FC and SATA drives. But tiering takes management -- each new storage system demands time and attention from a storage administrator.
An enterprise can deploy any number of storage tiers across any number of physical platforms, but there is a point of diminishing returns where the benefits of additional tiers are outweighed by the added management overhead. Most organizations implement three to five tiers. A disk-to-disk-to-tape (D2D2T) infrastructure is one example of a three-tier approach possibly using FC disk for high-performance, SATA/SAS disk for high-volume or long-term storage, and tape for archiving and backup.
The role of data classification
The biggest issue for tiered storage is data management. Matching the data to each tier, migrating the initial data and then moving data between tiers over time can be a daunting challenge. Each organization must identify and organize its data, and use that analysis as the foundation for policies that will govern data movement, retention and eventual deletion.
Data storage classification is the decision making process that identifies data and determines its value to the organization [see the Tech Closeup on data classification]. By comparison, tiered storage is the hardware, software and processes that actually implement those data classification plans. Data classification and tiered storage work together: Data classification is pointless unless you intend to tier the storage architecture, and it's impossible to place the right data in the appropriate storage tiers without having first accomplished a data classification initiative.
It's important to note that data storage classification is a manual process. Although there are software products that can help to identify and move data types, there is no automatic way to determine the value of specific files to an organization. Similarly, comprehensive data classification should involve input from key departments across the enterprise -- it should never be approached as an IT-only function.
Managing tiered storage
Tiered storage may sound great in principle, but data movement is an ongoing management challenge. It takes time to migrate data between tiers, and the process must be repeated regularly (e.g.; daily or weekly) as more data is created, or existing data ages or changes in relative value. More tiers result in more complexity. Software tools can help to automate data migration, but it is often left to storage administrators to initiate the movement based on established policies.
Tiering is also impacted by interoperability issues between the software tools and storage platforms. Not all storage tiering tools are fully heterogeneous across the physical storage platforms in use, and this can cause serious headaches for organizations that must use multiple tools or forego the use of certain storage systems.
Many tiered storage adopters seek to chargeback storage users based on the service level of each tier. That is, top-level Tier-1 storage might command the highest value, while Tier-2 SATA storage might be significantly cheaper; archival disk or tape platforms would be even less expensive. However, few tiered storage tools accommodate a meaningful chargeback model, making it difficult to represent storage in terms of real costs to users. As a consequence, it's hard to get users to see the value in tiered storage -- often causing users to demand higher tiers than they might otherwise need.
Tiered storage technologies and vendors
The majority of tiered storage tools center on storage resource management (SRM) software products. SRM products can typically collect, store, backup, recover, provision, virtualize and forecast data storage. SRM software may be offered as a standalone product or as part of an integrated program suite. Some notable SRM products include Command Central from Veritas Software Corp. along with ControlCenter from EMC. More recent SRM tools offer broad interoperability, and CA Inc.'s BrightStor supports over 100 storage arrays, tape libraries, storage area network (SAN) switches, servers, operating systems, databases and applications.
However, not all tiering tools are generic, and key storage manufacturers are paying particular attention to tiering functions. Compellent Technologies Inc. is one example, incorporating block-level virtualization to its Storage Center SAN that can automatically move data between tiers without manual classification and movement tasks. There are other products available, such as Tiered Storage Manager from HDS, which moves data at the volume level, and IBM's SAN file system, which handles data movement at the file level. Companies like 3Par Data Inc. plan to add a similar adaptive provisioning feature to its In Serv storage system in 2007, and EMC also expects to add "access frequency" as a classification parameter of its DiskXtender product. ***