Editor's Note: This tip was updated in November 2016.
The terms tiering and caching are often used interchangeably, but actually refer to two different storage acceleration techniques. Both techniques involve placing frequently accessed -- hot -- data onto a high-speed medium such as flash, but the similarities between the two mostly end there.
We'll take a closer look at SSD caching, how caching and tiering are alike, and some caching and tiering challenges.
There is one main reason to use caching or tiering: to increase data access performance. With caching, that performance boost comes from an increase in data access speed by caching heavily accessed data onto high-performance SSDs before moving it to primary storage (writing) or system memory (reading).
Tiering improves data access performance by placing the most accessed data on the fastest storage, less-accessed data on standard storage and the coldest (least accessed) data on the slowest storage type in the system.
Storage expert Dennis Martin from Demartek Inc. explains how to use SSD as a cache in a tiering situation.
Most vendors of complete storage systems today offer both caching and tiering in their systems to boost performance. But, as explained below, if these storage acceleration techniques are not configured correctly, they can actually impede each other's performance. Taking a DIY approach to implementing both technologies in your enterprise storage infrastructure would require an experienced storage administrator.
Three forms of SSD caching
Caching, in its simplest form, is nothing more than the copying of frequently or recently accessed data from its usual location to high-speed media. Suppose, for instance, that a particular file receives a lot of read requests. If caching is used, the cache would recognize the file as hot data and copy that file to high-speed media.
There are a number of variations to caching, but here are three of the most common forms:
- Write-around SSD caching best fits the description above. Data is written directly to primary storage, initially bypassing the cache, and copied to the cache only when it has been identified as hot. This method has the advantage of caching only the data that is likely to receive the greatest benefit from the cache. The main disadvantage is that there is no caching for write operations.
- Write-through SSD caching is suited for applications that write data and then immediately reread that data. That is because this type of caching writes data to the SSD and then immediately to the primary storage device. The advantage to this technique is that all newly written data is cached. However, the system can experience a high degree of latency for write operations because data must be written to two different locations before it is considered to have been written to disk.
- Write-back SSD caching is similar to write-through caching in that all write operations are cached. However, write-back caching makes data immediately available for use, even if it has not yet been committed to primary storage. This reduces latency, but adds the potential for data loss in the event of a cache failure. Vendors that use write-back caching usually implement safeguards such as redundant SSDs or battery-backed RAM.
Compare and contrast: Storage tiering and SSD caching
Although similarities exist between storage tiering and caching, there is a major difference. While caching involves copying data to a high-speed medium, storage tiering physically moves data between storage devices. Suppose, once again, that a particular file is receiving a lot of read requests, but this time, tiered storage is used. As was the case with caching, the system identifies the file as hot data. Rather than merely making a copy of the data and placing it on the high-speed tier, the data is physically moved so that the standard tier no longer contains a copy of the data. When data begins to cool, the system moves the file off the high-speed tier and back to the standard tier. A high-speed tier usually improves performance, but performance can be degraded under certain circumstances due to the extra IOPS created by the data transfer processes.
So should you use caching, tiering or some combination of the two? You will need to consider the type of workload and the amount of space available for caching or use as a high-speed tier. An organization might use tiering at the storage level, but implement a small, high-speed cache at the server level. Depending on the server's workload, performance degradation can occur as a result of double caching.
SSD caching and tiering challenges
Perhaps the most pervasive issue with caching and tiering is that of data being cached or tiered in a way that is not beneficial and is possibly even counterproductive -- such as double caching, which can degrade performance.
Backup operations are a common example. If not properly configured, the backup target may attempt to write all or part of the backup repository to a cache or high-speed tier. This isn't necessarily problematic in and of itself, but it is a waste of resources since the backup data is unlikely to be read in the near future. The limited cache resources would be better used for data that is accessed more frequently.
One way to solve this problem is to create policies controlling which applications are allowed to use caching or tiering. This saves SSD resources for the workloads for which they will be most beneficial.
Storage tiering and SSD caching offer the potential to improve performance, as long as they are allocated to the appropriate workloads. In the not-too-distant future, it seems likely that SSD may give way to NVDIMM for cache or use in high-speed tiers. While not as fast as DRAM, NVDIMM provides an impressive level of performance and very low latency.
How to avoid problems when write caching with SSDs
Are performance bottlenecks created from SSD and caching everywhere?
Tiering data for maximum use of your storage capacity