The performance numbers of all-flash arrays are soaring into the stratosphere, but it's often difficult to interpret vendors' claims of millions of IOPS, tens of megabytes per second and hundreds of barely negligible microseconds. Not only do the manufacturers use differing conditions to test flash performance, but workloads also rarely, if ever, mimic the real-world environments of their customers. As a result, it can be difficult to compare array performance accurately.
Dennis Martin, president and founder of Demartek LLC, an analyst organization which operates its own on-site test lab in Golden, Colo., has done extensive flash performance testing. In this podcast interview with SearchSolidStateStorage, Martin explains the importance of I/O per second (IOPS), throughput and latency for varying types of workloads, the significance of block size and random or sequential workloads and the chances that a million IOPS or microsecond latency will matter to the average enterprise data center.
Of IOPS, throughput and latency, which figure do you think is the most important one for an enterprise end user trying to sort out the performance claims of all-flash array vendors?
Dennis Martin: We look at all three of those -- IOPS, throughput, sometimes called megabytes per second, and latency, sometimes called response time. All three of them are equally important, just generally speaking, because they all give you a different dimension of the flash performance. Some applications tend to be more sensitive to, or tend to drive, one of those metrics, either IOPS or throughput or latency, more than others. And some applications might be interested in two out of three. For example, they might be interested in IOPS and latency, or they might be interested in throughput and latency. Generally you don't see a single application that wants to get both high IOPS and high throughput. And then latency is a second figure that is frequently needed, typically more so for IOPS-oriented workloads than throughput.
Can you give me examples of real-world applications that would need high IOPS versus high throughput versus low latency?
Martin: A high IOPS workload would be any kind of transaction workload, like a transactional database where you have lots of customers buying things and accessing the database at the same time. You might also think of it as not just one application but multiple applications. Before, you might not have put too many applications on one storage system because of the contention for the hard drives. In an all-flash system, some of those limitations are lifted, so you can put multiple applications on there. You might have a transaction database application that's very critical. You might put an email system on there. You might put some sort of a collection of virtual servers, or you might even do virtual desktops. There are all kinds of different applications that tend to be more IOPS-oriented.
If you're looking for throughput, then those are typically video streaming or backups or data warehousing or something where you're scanning over or looking at lots of data at once.
And then for latencies, sometimes it's a combination of the ones I just mentioned. You'll see some of the IOPS workloads also are interested in very low latencies. Sometimes they want to push latencies a little bit more because they're just looking for the fastest turnaround.
The performance claims of the major vendors of all-flash arrays are getting higher and higher, and cracking the million-read IOPS threshold is becoming commonplace. How important is a million IOPS for the average enterprise data center?
Martin: Generally, you're talking about a transaction-oriented workload where every individual I/O counts as far as how fast it happens. So, a million read IOPS is becoming commonplace. That's an important number because that just tells you where the high end is of this storage system. It also might tell you that you can run multiple applications against this same storage array because now it has the headroom to handle it.
Vendors often use a block size of 4K to get their maximum IOPS, and they use larger block sizes of 64, 128 or even 256K to get their throughput figures. Can you explain how the block size affects the performance figures?
Martin: One of the reasons you see 4K for all-flash arrays is because the minimum page size on flash devices is typically 4K. So, that's the smallest you would go because that's the smallest the actual flash media itself would accept as an I/O. Some of them are starting to get a little larger now, like 8K, but 4K is a good place to start.
If you're looking at IOPS numbers, the smaller block sizes are going to have larger IOPS numbers. So, a 4K performance is going to be higher than an 8K block size performance and so on. As the block sizes get larger in an IOPS-oriented workload, the number of IOPS will decrease. What that means is, even though it's fewer IOPS, you're getting more data because the block sizes are bigger. You're getting bigger chunks of data.
If the application is an IOPS-oriented workload -- that means you're interested in how quickly individual transactions can occur -- the workloads that generally do that tend to have the smaller block sizes. And so you're going to see 4K or 8K, something in that range, often for database workloads but not limited to database workloads.
When you get to the larger block sizes, like 64K, 128K, 256K or even higher, then you're typically talking about a throughput-oriented workload, which is not so much transaction sensitive, but it's more how much bandwidth can you get out of it or how big of a boat can you push through this. So, with the smaller block sizes, the bandwidths will be smaller. And as you push the block size up, then the bandwidth will go up and the throughput will go up.
So, IOPS will be high for small block, and throughput is going to see better numbers at the large block.
How does the random or sequential nature of the workload used in the test environment affect the performance numbers?
Martin: If you're doing sequential workloads, that means you're doing a backup job or you're streaming a video or something where you start at one point in the data and you just keep accessing contiguous blocks in sequential fashion. Sequential workloads tend to be faster generally because they are doing things contiguously, and typically the sequential workloads are also using larger blocks.
If you're doing random, you're doing something like a transactional workload or a file directory workload where you've got lots of different users accessing different things all at the same time. There's no real way to predict what the next block will be that is requested because you've got different users doing different things. The more users you have, the more applications you have that are accessing the same storage, the more and more it's going to shift over toward a random workload by the time it hits the storage system. Random workloads tend to be harder on storage systems, and this is true of both flash arrays as well as hard-drive arrays just because of the jumbled nature of the way things are coming in.
We're increasingly seeing latencies for all-flash arrays drop from milliseconds to microseconds. How important is the distinction between microseconds and milliseconds for the average enterprise data center?
Martin: For some workloads, latencies are actually more important than either IOPS or throughput. So where the round trip time or the latency is extremely important, you want it to of course be as low as possible. And microseconds are the next order of magnitude smaller than milliseconds. Just for definition purposes, 1,000 microseconds is equal to one millisecond.
Typically, with hard-drive arrays, you're going to have millisecond range latencies [of] 2 or 3 or 4 or 5. If it's not so good, it'll be of course higher than that -- into double digits.
For some applications, that's just not good enough, and that's why people are looking at all-flash arrays because now you can get latencies down into the very low milliseconds or down into the microseconds, which just means you've got very fast turnaround time.
In the final analysis, what recommendations do you have for IT pros trying to sift through the performance claims of all-flash array vendors?
Martin: Look for which of these tests are similar to the workloads that you have. If they're running, for example, a 100% random read workload test that has 4K block size, then you have to ask yourself, 'Is my workload like this?' The tests you want to pay attention to are the ones that are closest to your workloads in your environment. So, that presupposes that you know what those workloads are and that you've been measuring them and you at least have some ideas.
I would go through the tests and say, 'All right, is it read versus write? What's the percentage? What's the read-write ratio? Is it 100% read? Or is it 50% read?' That kind of thing. Again, look at the random vs. sequential. Is it 100% random? Is it 50% random? I would look at the block size. Make sure the block sizes are the same. I would also look at protocol to see if this is a file workload versus a block workload. If they go into it, then I would look for a fifth item in my list of I/O characteristics, and that would be queue depth. That's the number of I/Os that you should have at the same time. So, you really have all of those five I/O characteristics or parameters that you want to look at and compare.