Tiered storage is a way to assign different categories of data to various types of storage media with the objective of reducing the total cost of storage. A tiered storage architecture places data in a hierarchy according to its business value. Tiers are determined by performance and cost of the media, and data is ranked by how often users access it. Generally, the most important data is served from the fastest storage media, which typically is the most expensive.
In a basic configuration, a fast tier of flash storage achieves performance, while other data is written to secondary storage on disk, tape or the cloud. Data that needs to be kept indefinitely is kept in an archive tier.
Tiering is one link in a chain of activities governed by information lifecycle management (ILM).
IBM pioneered a multi-tiered storage architecture for use on its mainframe computers. When it was first conceived, tiered storage involved placing primary production data on varying configurations of Serial-Attached SCSI (SAS) and Serial Advanced Technology Attachment (SATA) hard drives. Data was written to blocks on disks using techniques such as short stroking and striping across a redundant array of independent disks (RAID).
This resulted in tiers of storage with varying capacity, cost and performance characteristics. An additional tier of tape libraries sat behind the disk to provide a deep archive for cold or warm data.
The rise of hierarchical storage management (HSM) helped reduce the manual process of storage tiering. Software automation shuttles the data dynamically between different storage systems, drive types or RAID groups in real time, in ways that are largely transparent to the user.
Tier 0 and the rise of flash
The rise of solid-state storage and flash storage has brought about what many refer to as Tier 0 storage. Tier 0 is faster than traditional Tier 1 storage, and much of the data formerly considered Tier 1 is now stored on Tier 0. Tier 0 storage is the fastest and most expensive layer in the hierarchy and is suited for applications with little tolerance for downtime or latency. Data placed in a "zero tier" often includes scale-up transactional databases for analytics, financials, healthcare and security.
When disk was dominant, storage administrators placed Tier 0 data on faster, more expensive hard disk drives (HDDs), using slower, less expensive disk for less important data. The disk topology remains in use, although solid-state storage has supplanted disk as the main media for Tier 0 data.
To use disk as a Tier 0 target required a portion of server random access memory (RAM) to function as a virtual disk drive. This meant less memory was available for compute. RAM also requires a constant power supply to retain data. Solid-state drives (SSDs), which are based on nonvolatile memory (NVM), eliminate this issue.
The advent of hybrid storage arrays mixing flash and HDDs brought about a need for automated storage tiering software to ensure that only the most important data stayed on expensive SSDs inside an array. Storage tiering started as a manual process, but automation has taken a greater role in the analysis of data placement.
On all-flash arrays, primary storage systems often have just one tier. However, there can be different tiers of flash, such as RAM, low-latency Peripheral Component Interconnect Express (PCIe) flash and SSDs.
Compared to traditional spinning disk, flash storage expands the data management features delivered through software. This includes data reduction techniques, such as inline compression and deduplication. Hybrid storage systems will cache hot data in flash for quick retrieval and subsequently write remaining data to a tier of back-end disk.
Ideally, all data would reside on the fastest tier possible, but that isn't practical in most cases. Moving data on a traditional disk array consumes input/output (I/O) requests for host requests. This creates bottlenecks as the system waits for the completion of data moving between tiers. With flash, only a small portion of capacity is dedicated as cache to accelerate high-performance workloads.
The tangible benefits of enhanced performance, such as faster time to market or increased sales, need to be weighed against the cost of storage.
Tier 1 storage
Tier 1 data includes mission-critical applications, recently accessed data or top-secret files. This data might be stored on expensive, high-quality media, such as double-parity RAID. Increasingly, flash and in-memory storage are options used to boost performance on select workloads.
Tier 1 storage is reserved for data that depends on fast reads and writes, such as any application linked to revenue or business operations. An example is an online transactional database serving high-speed applications in real time. Fast storage delivers the required latency or throughput.
Even though some Tier 1 applications will stay on spinning disk, enterprises will run select workloads on all-flash storage or on hybrid flash. In some cases, IT shops take advantage of idle computing capacity to run transactional databases in fast in-memory storage. These devices include nonvolatile dual in-line memory modules (NVDIMMs) that slide into a standard server slot.
Tier 2 and Tier 3 storage
Even as primary operations are being completed, Tier 1 data is usually written simultaneously to a secondary tier of disk-based backup appliances or to magnetic tape. Data centers deploy a backup tier to aid business continuity and disaster recovery (BC/DR) through fast restores of key files and storage hardware.
Many organizations direct backups to disk for a set period of time, after which the data is then moved to a tape library for long-term retention.
Data on Tier 2 storage usually contains historical financial information, cold data and classified files. This data is preserved on lower-cost media in a conventional storage area network (SAN).
Tier 2 backups also may entail enterprise resource planning (ERP) systems, corporate email and back-office applications. In general, Tier 2 storage protects application data that requires high reliability and security but doesn't need submillisecond latency.
In a three-tiered storage system, an archive tier sits behind the backup tier. Data in the archive may contain event-driven, rarely used or unclassified files on slow-spinning HDDs, recordable compact discs or tapes. An archive keeps fixed copies of any content deemed to have a strategic value, however slight it may be. The content within an archive can be retained indefinitely or set to expire by a certain date.
Companies in regulated industries use archives to migrate aging or inactive data off more expensive storage. Archival storage supports compliance, historical analysis or other business needs that may arise periodically. In addition to physical storage, the evolution of the hybrid cloud presents another storage tier. Service providers and internet companies increasingly are starting to integrate a scalable tier of object storage to manage unstructured data. DevOps teams often use object-based clouds for collaboration for testing prior to launching new production-level applications.
The public cloud can replace higher tiers for rarely accessed data. Many storage experts predict the use of fewer storage tiers, possibly even only two, with primary data going on a flash tier and archived and backup data placed in the cloud.
The rise of array-based tiering
Storage array vendors have now embedded automated tiering into the software management stack. Automated policies move data to the appropriate tier based on a company-defined policy, typically in real time. The impetus to move data is in predefined metadata attributes.
A number of third-party software vendors also offer management software that includes tiered storage. These products include software-defined cloud gateways, copy data management and enterprise file sync-and-share suites.
Storage experts say a well-developed data taxonomy is the linchpin to an optimized tiered storage architecture. A taxonomy classifies all data and balances the type of storage performance required against the cost of such a plan. Availability, performance and service attributes of each tier should be clearly defined. The goal is to allow an application to choose the storage that aligns with the business tasks it carries out.
If a business depends on continuous uptime for its transaction processing applications, the revenue generated probably will more than cover the cost of high-performance storage. Not only does this enhance application performance, but it frees up primary storage by automating backups to a tier of lower-cost capacity disk.
It is generally accepted that only 10% to 20% of data is considered "hot" at any given time. This means the fastest, most expensive storage should be dedicated solely to this frequently accessed data, with the remaining 80% to 90% stored on a cheaper tier of storage.
Tiering vs. caching
Storage tiering and caching technologies are often used interchangeably -- especially when dealing with flash media -- but they are different processes. Tiered data resides on one media type at any time but moves between media as data access patterns change.
Caching temporarily places a copy of the data on a high-performance medium, such as dynamic RAM (DRAM) or solid-state memory, to improve performance. But the cached data also resides on a lower storage tier, usually an HDD.
The host software or storage controller places a copy of the data in the SSD cache, sitting between the application and back-end storage. The original copy of the data remains in its initial location.
Conversely, tiered storage moves data to a different storage medium, selecting the location that balances availability, performance and the cost of the storage media.