Rarely does an opportunity come along to simultaneously increase performance and reduce costs. However, SSD is an interesting technology that can do just that. Nearly every major storage vendor and plenty of startups offer a wide variety of SSD products. Solid-state drives can be deployed in one of three places: array-based SSD is considered to be behind the storage-area network (SAN); server-based SSD is in front of the SAN; and SSD appliances can be deployed on either side of the SAN. Selecting the best deployment method depends on the nature of the problem to be solved and the use case it's intended to address. Understanding the deployment nuances can avoid over-engineering and therefore over-spending.
Because of the confusion, or perhaps due to a lack of exposure to other solutions, IT managers may opt for the easiest implementation, which is array-based SSD. In many cases array-based SSDs are the best option, but not investigating server-based SSD and standalone appliances means you might end up overlooking the best solution for your environment.
The key element that determines the best SSD architecture to deploy is latency and where it poses the greatest inhibition to application performance. Regardless of the SSD technology selected, actual retrieval of the data is at memory speeds. I/O throughput will vary by device, but that's based on device design, not the device’s position relative to the SAN. SSD latency is measured in nanoseconds, while network and hard disk drive (HDD) latency is measured in milliseconds. Network and/or HDD latency will be somewhere in any SAN architecture, so where you put it becomes crucial to optimization.
Here's an overview of the three SSD deployment methods:
Array-based SSD is implemented as a separate logical tier in the array, referred to as tier 0. Because it's inside the array, it's connected directly to the storage backplane. Data movement between tiers is determined by the HDD latency, drive throughput and backplane latency. Of these, the most significant will be HDD I/O throughput. Several factors contribute to total I/O throughput, but for the purposes of this exercise we’ll play fast and loose with terminology and refer to the aggregate as latency. The backplane itself is unlikely to be the limiting factor in total data access latency for most enterprise arrays, as vendor architectures go to extremes to avoid it and match it to the HDD capabilities.
Automated storage tiering (AST) software uses sophisticated algorithms to determine when data is active and migrates it from a lower tier to the SSD. This data transfer incurs all the HDD latency, but it's a one-time cost. Thereafter, the recursively requested data is read from SSD with nanosecond latency.
Even though the media read latency is reduced to nanoseconds in a behind-the-SAN architecture, there’s no way around the milliseconds of latency across the SAN or wide-area network (WAN). This latency will vary considerably due to numerous factors, but it's now the major inhibitor to total read-request throughput. Roughly speaking, only half of the millisecond latency problem is eliminated.
Perhaps the best use case for array-based SSD is general-purpose performance enhancement. Because AST software bases its data movement decisions largely on I/O activity, it's not application-specific in its basic form. Thus, it will deliver significant overall data access improvement in a simple to implement and manage package.
Server-based SSD is becoming increasingly popular. These implementations are primarily PCI Express (PCIe) cards deployed along with the servers. Both server vendors and some storage vendors offer server-based SSD. Fundamentally, it's a large amount of cache immediately accessible by the CPU, yet provisioned and managed like storage.
The methodology for moving data into server-based SSD isn't much different from other SSD deployments. Data may be elevated to the SSD based on access patterns or it may be positioned there. If the data is coming from SAN devices, then the initial read time is limited by both the SAN and the HDD latency. Once again, this is a one-time cost. Thereafter, data is read directly by the server with no SAN or network latency. So the millisecond problem is eliminated entirely.
The best use case for SSD deployment in front of the SAN is for highly static data that will be frequently accessed over the long haul. Examples of this type of data are database indexes or whole databases. This type of deployment can reduce data access latency up to 90%. Although some AST software can move data from arrays to PCIe SSDs, frequent swapping of data between tiers would incur significant millisecond latency penalties. In these cases, an array or appliance solution might be better.
SSD appliances are SSD arrays in their own enclosures. The chief advantage of SSD appliances is the ability to locate them anywhere between the host and the array, depending upon where latency is the greatest problem. Appliances deployed near the servers can be used for network boot devices, which largely solves the “boot storm” problem. SSD appliances may also be ideal for file serving in a clustered or virtualized environment, especially for media files. Placing the appliance near the servers eliminates most of the network latency. There will still be some network latency, but it will likely be minimized by proximity. Nevertheless, when data must be retrieved from a conventional array, it will incur both the SAN and HDD millisecond latency penalty.
The second use case for SSD appliances is on the other side of the SAN, near the conventional arrays. Appliances in this type of deployment could be used as an aggregate SSD tier for virtualized storage. Rather than placing an SSD in each array, the SSD appliance would be available as tier 0 for all arrays. This may improve performance for virtualized storage, where logical unit numbers (LUNs) span disparate physical arrays and data is moved dynamically between systems. Thus, back-end storage management activities wouldn't impact the data access operations occurring at tier 0.
A third use case for appliances would be on the data center side of a hybrid cloud deployment. The latency in accessing data from a cloud data center could be considerable, based on WAN distance and HDD characteristics. Often, cloud deployments involve high-capacity/high-latency HDDs to minimize costs. By using the SSD appliance in the data center, frequently accessed data will be closer to the consumer, with far less latency than going all the way to the cloud provider. This latency will be incurred every time it's necessary to access data from the cloud array, but it will still be a substantial overall improvement.
A fourth use case for appliances is to improve the aggregate throughput of older arrays. But it may not be cost-effective to add new a SSD to an aging array. By placing the SSD appliance in front of older arrays, organizations may be able to significantly improve data access speeds while extending the useful life of existing assets. The cost of the approach may be substantially less than a forklift upgrade.
Overall, the wide variety of SSD options in the marketplace make it possible for storage architects to fine-tune I/O performance to match the application requirements without over-engineering the solution. Specific deployment characteristics will vary by product, but most vendors have best practice guidelines to help. Starting with a latency impact analysis will help storage architects apply SSD precisely where it's needed.
BIO: Phil Goodwin is a storage consultant and freelance writer.
This was first published in October 2011