Every business has experienced tremendous data growth, and adequate storage and retrieval of that data is a serious concern. In years past, IT would simply "buy more storage," which meant adding high-end Fibre Channel (FC) drives to a storage array or SAN.
But companies are rethinking their basic approach to data storage. High-end storage is very expensive, and using top-shelf storage for all data assets is no longer a cost-efficient solution. And since companies are now obligated to meet regulatory and legal data storage requirements, they must account for what data they have, where it is located and who is accessing it -- details that cannot be determined by just adding hard drives. Tiered storage is emerging as one means to address these changes.
Tiered storage is a way of reorganizing corporate data onto a variety of storage media. Tiering involves the selection and implementation of storage systems, the software to manage and optimize that storage, as well as the policies and procedures needed to operate each tier.
"The concept of tiered storage is to better align data on storage devices with its value or importance to the organization," says Jim Damoulakis, chief technical officer at GlassHouse Technologies Inc. The idea is to retain the most timely or important data on fast, high-I/O storage (such as Fibre Channel or solid-state disk drives), and move less important and less frequently accessed data to less-expensive (lower-performance) drives, such as SAS, SATA or even tape drives.
Data can also be moved over time as its value changes. For example, a quarterly financial report may start on high-performance storage, then be migrated to secondary storage, and eventually be archived or offloaded to tape. This isn't simply an exercise in data migration. Data must be relocated as necessary based on its relative value, i.e., a business decision rather than an IT decision.
Tiered storage offers cost benefits. Shifting less valuable data to less-expensive storage media can allow for higher storage capacities (now exceeding 1 TB to 1.5 TB per drive) at a lower cost per TB. "If I treat all my online storage the same, if I only had one kind of storage, I'd probably be wasting some money," says Brian Garrett, analyst at Enterprise Strategy Group. "If I'm able to pull out lower-priority or old data, I can put it on something more cost-effective and save money."
However, shifting to larger, lower-cost storage entails a performance tradeoff. Business typically translate tiers of storage into "tiers of service," offering storage users a more practical measure of reliability and accessibility at each level.
Tiering can improve storage performance. "If you have it [data] placed on the right performance profile [tier], you may be able to get at it quicker to better service your customer's needs," Garrett says. When all corporate data is stored on a single tier (e.g., FC), all network users are competing for access. By separating storage into tiers, I/O operations are spread out among multiple tiers (usually across several storage subsystems). So even though SAS or SATA drives may offer lower I/O capability than FC drives, reduced competition for I/O time may actually allow for good performance at the SAS or SATA tier. And since this also reduces the number of I/O requests arriving at FC drives, top-tier performance may also improve -- enhancing the storage service experience for all users.
Even solid state disk drives (SSDDs) are changing the way that storage performance versus capacity considerations are handled. "Solid state for pure I/O and larger, lower-performance drives for pure capacity might be a good combination in many data centers that gradually reduces the need for faster FC drives," said Mark Peters, analyst at Enterprise Strategy Group.
Dispelling tiered storage myths
The most common error in tiered storage is the confusion between data classification, tiered storage and information lifecycle management (ILM). Although data classification is closely related to tiered storage, the two are different. Data classification is the process that identifies data and determines its value to the organization. Tiered storage is the hardware, software and processes that actually implement those data classification plans.
"Buying the storage is easy," says Damoulakis. "Doing the data classification work required to take advantage of that storage is the challenge." Data classification is pointless unless you intend to tier the storage architecture, and it's impossible to place the right data in the appropriate storage tiers without having first accomplished a data classification initiative. It might be said that tiered storage is the practical expression (or implementation) of storage decisions made during the data classification process. Data classification and storage tiering are a part of ILM, but ILM is an even broader umbrella of policies and procedures that define how data should be handled throughout its lifetime.
Tiering storage is supposed to save money, but a simple cost/gigabyte or cost/terabyte analysis is not a good yardstick. Cheaper disks may cost more in the long run if they're hard to manage. Even though Tier One storage is more expensive in terms of raw cost per GB/TB, if it's managed and utilized well, the total cost may actually be less than a high volume of lower-end disks, where management may be more difficult. "We've actually seen examples where the cost per utilized GB of the more expensive Tier One storage is actually lower than the highly unutilized cost per GB of the Tier Two storage," Damoulakis says.
A tiered storage initiative doesn't guarantee cooperation within the organization. Proper tiered storage deployment requires a great deal of underlying classification work, and many managers are hard-pressed to justify the expense to department leaders or other decision-makers who are already used to data being highly accessible from fast storage. The easiest way to garner support is to associate cost with each storage tier, and let each department decide how much storage service they want to pay for.
But in companies that don't have a chargeback model in place, business leaders must find other arguments for tiering, such as performance streamlining. "Your less frequently accessed data will be OK, and you'll get good -- maybe even better -- response times if we take it out of a high-performance pool where it's competing with other data," says Greg Schulz, founder and senior analyst at StorageIO. Garrett suggests a different argument for the near term. Rather than making the case to move data from FC to lower tiers, try making the "bottom-up" case to move data from tape backup to SAS or SATA to improve recovery performance. Such a tactic gets users comfortable with middle tiers and demonstrates its benefits, making it easier to justify moving data from the top tier down the road.
Other technologies have also evolved that affect the importance and use of tiered storage today. For example, data deduplication eases storage needs by removing redundant files, blocks and bytes from stored data, effectively reducing a corporate data set by 50-80%. Similarly, thin provisioning supports the creation of LUNs that are logically bigger than the actual storage allocated to them, allowing users to "grow into" LUNs while only purchasing and deploying the minimum amount of storage up front. These and other technologies do not reduce the importance of tiering, but are critical to rein in the proliferation of physical disks while helping to mitigate storage cost, power, cooling, backup/DR planning and other data center issues.
Getting the most from tiered storage
There's no one "right" way to handle tiered storage. Implementations vary based on corporate business needs, data classification granularity and IT budgets. An enterprise may use a different storage array for each tier, while smaller organizations may mix drive types to establish multiple tiers within a single array. Some users may opt for FC and SAS drives, while other businesses may go with FC, SATA and tape. In the end, tiered storage requires an understanding of business needs. "Learn more about the business and learn more about the applications that support the business," Schulz says, noting that application performance can sometimes be optimized by matching each application's data needs to its appropriate tier in the IT architecture.
Avoid the tendency to express tiers as specific technologies (e.g., Tier One FC versus Tier Two SAS or SATA). Instead, present tiers as "service levels" that express real business needs.
This is particularly relevant given the importance of modern networking technologies that tie storage to the business. Faster networks allow faster access and further improve responsiveness. Traditional Fibre Channel storage networks have accelerated to 8 GB, while Gigabit Ethernet and 10 Gigabit Ethernet now support popular iSCSI storage, and Fibre Channel over Ethernet is poised to carry FC traffic over everyday networks in late 2009 and beyond.
Damoulakis suggests creating a limited number of service classes and identifying the corresponding business costs of each class. "Work with the business units and relevant IT areas to determine the appropriate classes of service," he says. "What does each 'basket' of features look like?" When supporting service classes through service level agreements, it's important to see that the back-end process (your infrastructure) is capable of delivering the promised level of service across a range of network conditions.
Tools are appearing to automate tiering, migrating data transparently based on a range of criteria. Although the technology is promising and algorithms are improving, experts like Schulz note that automation carries significant overhead -- particularly under heavy workloads -- while performance demands and response times grow more demanding. "Vendors are waking up and talking performance, including IOPs and bandwidth, instead of a focus just on capacity," Schulz said. "The next step is to talk about latency … and with fast technology like SSD."
Storage virtualization has indirectly influenced storage tiering. Virtualization allows storage pooling. This improves utilization, but can abstract storage tiers and make it difficult or confusing to maintain physical separation between tiers. The primary benefit of storage virtualization is the management agility and ability to migrate storage content between systems for maintenance or technology refresh. From that perspective, virtualization can enable and support tiering quite well, but it still does not handle any of the underlying classification work or gauge the relative business importance of the data being migrated -- that remains a largely "human" decision.