Performance is the driving factor for the vast majority of companies considering a solid-state storage array. While...
there are some other benefits, most companies just need faster storage.
Some common use cases for flash storage include high-density, virtual server infrastructures; virtual desktop infrastructure; high-transaction databases; and Web-facing applications. When shared storage is required (as opposed to putting flash directly into the application server), there are two options: all-flash arrays (AFA) and hybrid flash arrays.
This article looks at both types of systems and discusses how to determine which solid-state storage array is best for your environment.
All-flash solid-state storage arrays
As the name implies, all-flash arrays are 100% flash solid-state storage systems. Some use drive form-factor solid-state drives (SSDs), while others use custom flash modules to populate the array chassis. They are available in scale-up and scale-out architectures using both proprietary and commodity hardware nodes. AFAs support file, block or object storage protocols, and some provide a unified approach that supports multiple protocols.
The majority of AFAs offer fairly complete services to support data protection, data handling, efficiency and so on. Early on, however, many all-flash arrays lacked these features. Similarly, storage management feature sets have evolved, with most AFAs providing an administrative experience similar to traditional storage systems. All-flash array capacity currently ranges from a few tens of terabytes to multiple petabytes; and data reduction technologies like deduplication can be especially effective in AFAs.
Flash as a storage medium costs less to operate than hard disk-based systems. It consumes less power, creates less heat (requiring less cooling) and takes up less physical space in the data center. Because their storage media is homogeneous, AFAs don't require complex decision making or data movement. This gives them more consistent performance and improved scalability -- factors that can make them a better fit for large, multi-tenant environments.
Hybrid solid-state storage arrays
A hybrid flash array combines flash, usually in 2.5-inch form-factor drives, with hard disk drives (HDD) to lower the effective cost and increase effective capacity. This type of array also comes in scale-up and scale-out architectures, using purpose-built or commodity hardware to provide combined raw capacities (disk and flash) larger than most all-flash solid-state storage arrays.
Current hybrid offerings support block, file and object-based protocols, and unified systems are also available. Storage services and management features are similar to traditional arrays, which makes the switch to a hybrid flash solid-state storage array easy from an operational perspective.
Since most performance requirements are temporary, the storage system can move particular data objects to flash when the compute process needs them and back to the HDDs when it doesn't. This creates a multiplier effect that enables a smaller amount of flash to accelerate a much larger total data set. Hybrid arrays today use caching or tiering to accomplish this data movement.
Flash caching and tiering
Read caching involves keeping a copy of the most frequently accessed data objects in flash so that read requests can be fulfilled without incurring hard disk latencies.
Since cache capacity is at a premium, the better a hybrid solid-state storage system is at keeping the right data in cache, the more overall performance will improve and application data will be accelerated. While supporting read transactions is the most common use of flash in hybrid arrays, most of them also use a cache to accelerate write operations.
Write caching involves storing write data in flash first, while acknowledging the write transaction to the host, and then copying that data to HDDs. All data must eventually be copied to hard disk storage, so the solid-state storage array must have a large enough write cache (i.e., less flash capacity available for reads) or enough non-write time to allow the cache to empty. Otherwise, write performance suffers.
Instead of creating a second copy of data in a cache, flash tiering moves hot data objects out of the hard disk area and into flash to support periods of maximum activity. Ideally, all read and write activity is performed in flash. Eventually, data is copied back into the HDD tier, a process that can be performed manually or based on policies, as it is with caching.
Speed is a critical factor for the applications that typically drive flash usage, and those applications (and users) often get accustomed to flash performance. When unexpected demands cause a cache or tier miss, applications must read data from disk drives. However, the drives typically used in a hybrid solid-state storage array are often very slow with high capacities. As a result, latency can cause unacceptable wait times for users, slow online transactions and bottlenecks in other production applications, among others. For this reason, workload predictability is key to the effective use of flash in a hybrid array.
Other factors to consider
Certain use cases don't work well with all-flash arrays or hybrid arrays, so the first step is to identify if any conditions in your environment or workloads make one of these options a bad fit. For AFAs, the most obvious factor is capacity required and its effective cost. If the application's current or expected data set is too large for available flash or the budget is too small to buy more, then an AFA is not an option. Look at the effective capacity of the flash system after data reduction, as well as the raw capacity, when making your decision.
For environments that need 100% consistency and no chance of a cache or tier miss, an AFA is probably the better solid-state storage array option. These include the use cases that all-flash storage was first developed for in the financial, Internet-based and high-performance computing industries. AFA efficiency advantages also make these systems effective for multi-tenant cloud environments that need low overhead and predictable performance as they scale.
AFAs are more attractive when IT can't make the assumptions required to support data movement in a hybrid array. But even when this isn't the case, the simplicity of an all-flash array still wins out for many companies. If they can afford enough flash capacity to support the applications that need performance, they buy an all-flash array. If not, they buy a hybrid.
If workloads can handle the occasional cache or tier miss, a hybrid solid-state storage array offers the best economics as it allows more workloads to be accelerated for a given investment. It's also reasonable to assume that performance will improve as users become more familiar with their applications' storage demands and as caching parameters are fine-tuned. Hybrids include high-capacity disk drives, so they also provide better and more cost-effective scalability, although at HDD performance levels.
In reality, most people under-buy flash for a hybrid array because they simply don't know how much flash they need or they're more focused on saving money than improving performance. Hybrid vendors are also guilty of underselling flash because it makes their cost advantages over AFAs more dramatic. By and large, statistics show that a 5% flash to hard disk capacity is typical for a new hybrid configuration, but so is a cache hit rate of two-thirds, meaning one out of three transactions aren't served from cache. In most environments, moving the flash investment to 10% of total capacity can practically eliminate cache misses.
About the author
Eric Slack is an analyst at Storage Switzerland, an IT analyst firm focused on storage and virtualization.
When's the right time to buy an all-flash array?
Performance vs. function when choosing an all-flash solid-state storage array
Five startups that offer hybrid flash systems
Virtual desktops and solid-state storage array selection
Create your own hybrid solid-state array