In IT, it's all about predictable performance. Organizations want to ensure their critical business applications...
can always access the fastest available storage. At the same time, they don't want to overspend on storage resources like flash. In this expert tip, we'll discuss how to choose between two methodologies for efficient business data acceleration: caching and tiering.
Storage tiering or hierarchical storage management has been around since the early days of mainframe computing. Today, some hybrid flash storage systems also utilize storage tiering technology to segregate active data from inactive data. On these systems, all data typically starts out on the hard disk tier; over time, as the system analyzes data I/O patterns, the most active data sets are migrated to the available flash tier.
Caching, on the other hand, can be implemented in three ways:
1. Write-around caching, otherwise known as "read-only" caching, copies data into the cache, from disk, after a certain amount of read requests occur.
2. Write-through caching writes data to the flash tier and the hard disk tier at the same time. This is done for data redundancy purposes, but also ensures recently written data will be quickly accessible from the fastest tier of storage.
The downside to the above two forms of caching is that write I/O occurs at the speed of hard disk storage since the application can't accept the next transaction until the write operation has occurred.
3. In write-back caching, write I/Os occur at the speed of flash because the cache sends an acknowledgement to the application as soon as the write takes place on the flash tier. The cache will eventually copy the data to hard disk once enough writes have coalesced or queued up. This type of caching allows for rapid application I/O in heavy write environments. However, data loss could occur if the cache resource fails before the data is copied to hard disk storage.
Tiering and caching compared
The main difference between tiering and caching is that in tiering implementations, hot data is actually moved between hard disk storage and flash. Likewise, as data cools, it's physically de-staged from flash to hard disk storage. In other words, the data only lives in one storage tier at a time. As a result, tiering architectures need to have redundancy features like RAID built into their designs, but this can significantly increase costs, particularly in flash-heavy configurations.
In a caching environment, data is copied or mirrored from one tier to the next, so active data is present in multiple tiers simultaneously (flash and hard disk storage) for redundancy purposes. When data cools in a caching environment, the cache simply makes flash space available for other data sets to use.
Data acceleration and the flash factor
While there are differences in how caching and tiering environments promote data, it's more important to look at how a vendor implements these technologies. For example, in a typical tiering architecture, there's usually a waiting period before active data is moved to flash. To sidestep this issue, some storage tiering vendors have implemented a "flash first" approach in which all data starts on flash and is then pushed to the hard disk tier as it becomes inactive -- so all data acceleration is immediate. The obvious challenge with this approach is that the flash tier has to be sized to accommodate the initial data set; for example, if you have 10 TB of data, you'll need at least 10 TB of flash to start with. So, it's possible to overprovision flash in these environments.
When considering caching, it can be deployed more granularly to address specific applications that require high performance, such as databases and online transaction processing systems. For example, server-side flash can be implemented with caching software to enable physical host or virtual machine-level application acceleration. If the read/write I/O attributes of the application are known, a specific caching technology can be chosen to further customize how data acceleration into cache occurs. The challenge here is that you need to have some knowledge of the I/O patterns of the application first. In addition, if there are dozens of applications that need access to flash, it may be challenging to manage them all through this approach.
Caching and tiering provide different methods of application data acceleration. Knowing which approach to choose depends on a variety of factors such as the number of applications that require access to high-performance storage resources, the I/O patterns of these applications and the budgetary objectives for the project deployment.
Which automated storage tiering option is right for your environment
Auto tiering vs. caching in hybrid storage systems
How a tiered storage model differs from flash caching