What you will learn in this tip: Storage expert Jon Toigo discusses the challenges associated with implementing a tiered storage model, and how it can help manage storage capacity demand and lead to a more efficient data center.
Storage tiering is not a new idea. It refers to configuring data storage infrastructure as a set of "tiers," where each tier comprises a collection of media (memory, disk or tape) having distinctive performance, capacity and cost characteristics.
Once these tiers are established, tiered storage expands to include processes for migrating data over time to ever slower, more capacious and less expensive storage tiers. This movement can be driven by simplistic criteria such as file metadata attributes like "date last accessed" or "date last modified" (identifying files that are rarely accessed and that can be safely moved to a lower performance tier), or by a more granular analysis of the business context of data and the application of predefined policies for information lifecycle management.
This interpretation of storage tiering traces its origins to the earliest days of mainframe computing. Early IBM mainframe operating systems provided direct support for tiering via tools such as systems managed storage and hierarchical storage management (HSM). With the onset of distributed computing architectures, this kind of storage tiering model fell out of use. The practical limitations imposed by early network interconnects and server backplanes impaired tiering by constraining data movements. Moreover, distributed storage tended to lack a vendor-agnostic storage service or storage resource management (SRM) paradigm that enabled well-managed data movements between storage products from different vendors.
While the bandwidth of interconnects has improved over the years, the challenge of migrating data between different storage kits (especially those from different vendors) remains an impediment to a traditional tiered storage model and the efficient capacity utilization (the right data on the right type of storage) it presumably enables. Despite this challenge, recent developments in storage, including rates of growth in storage capacity demand and the increasing costs of storage at the level of the finished array, are stimulating renewed interest in tiered storage.
This interest is only one of the motives of vendors now offering storage tiering products. Some vendors are keen to promote tier-zero storage, which are arrays comprising all-flash, memory-based solid-state drives (SSDs). Tier-zero arrays are seen as an initial write target for I/O-intensive apps, used to boost I/O performance, especially behind virtual server workloads.
Other vendors are presenting multi-tiered storage arrays -- arrays with shelves containing SSDs, other shelves of low-capacity/high-speed disk and still others containing high-capacity/low-speed disk, all within the same box -- as a one-stop shop for their customers. Not surprisingly, the storage media (fast hard disk drives, capacity hard disk drives and SSDs) in each shelf is significantly more expensive than the same media provided in a traditional array, in large part because of the premium charged by vendors for the tiering software included on the array controller.
Tiering has also acquired a new meaning in some vendor literature, describing the use of a read cache comprising either DRAM or flash SSD to temporarily store data written to disk that is now receiving multiple concurrent access requests. This "hot data" is temporarily written to tier zero (memory), where it can provide higher I/O access to multiple user requests than is possible with magnetic disk alone. When the requests drop off, the data is determined to be "cold," and access requests are repointed to the original disk media. With this hybrid technology for augmenting disk performance with memory, it is possible to obtain industry leading read/write performance without deploying an excessive number of disk drives striped together for parallel access.
Getting started with a tiered storage model
If you want to implement some sort of HSM-style tiering in your storage infrastructure, there are a few things to consider.
1. You need to understand which data needs to be moved. Moving all data based on simplistic "date last accessed" criteria may not be a good idea, as application software and other files may need to be excluded. Run reports from any decent SRM software package to identify candidate data for inclusion in an HSM scheme. Then check with the owner or owner/manager if necessary to ensure that the data is safe to move. Some objections may be registered even for files that haven't been touched in 90 days.
2. You need to know your storage and where you are moving the data. Some arrays will allow data exchanges to occur effortlessly, while others require storage targets to be from the same vendor as the arrays on which data was first stored. With some arrays, vendors use "open APIs" to allow any application to write data to the array, but implement "closed APIs" to restrict your ability to move data off the spindles to another array. Make sure you understand the capabilities and restrictions of each target array in your HSM scheme.
3. Model, simulate and experiment. There are HSM software packages that can be used for a period of time without charge and that provide the means to test your HSM scheme. Use these to set up a test HSM environment before you go live. Absent this advance work, you run the risk of moving files to a lower tier, filling up the space they once occupied on an upper tier, and then discovering that to run a critical year-end report, you need to promote the data back to the tier you demoted it from but lack the space. To prevent tiers of storage from becoming "tears of storage," be patient and test everything.
The real challenge of tiered storage: Getting the terms straight
In the final analysis, and hybrid technologies aside, storage tiering is a widely used term with little probative value. In some cases, it is described as a strategy for cutting down on capacity demand or for delivering capacity allocation efficiency. Technically speaking, a tiered storage model does neither. While moving data to lower tiers might free up space for new data on the tiers above, this is not the central purpose of storage tiering. Rather, tiering strives to place storage on a tier that makes the most sense from the standpoint of a business-savvy mixture of data access frequency and media cost. As such, its intention is to deliver utilization efficiency, rather than allocation efficiency.
Other vendors, by contrast, describe storage tiering as archiving, which is misleading. Data moves in an HSM scheme from faster tiers to slower ones based on access frequency. Archives typically comprise data sets that are grouped together based on business criteria rather than simple access frequency. Using information lifecycle management policies, which consider the business context of data, may produce a proper archive; HSM does not.
It is worth mentioning that the arrival of IBM's Linear Tape File System (LTFS) is upping the interest in using a tiered storage model. For files, which comprise more than half of the new data being created and stored today, a tape-based filestore leveraging a tape library front-ended by an LTFS-enabled server can provide an ultra-high-capacity storage platform with a significantly lower cost of ownership than a disk array. Considering that the re-reference rates of user files tend to drop precipitously following 10 days to 30 days of creation, moving older files to a tape repository that acts like a NAS platform may make considerable sense.
While advocates of LTFS include a group with the name Active Archive Alliance, the truth is that LTFS tape is about storage tiering and capacity utilization efficiency measured by access frequency. With the arrival of LTFS file storage, storage tiering may be ready to begin delivering on its long-promised business value case.