A Deep Dive into Storage Performance Benchmarks

By Larry Freeman, NetApp

Free Download

Discover the benefits of flash storage for database workloads

View Now

Storage system benchmarking is part science and part art. The latter is especially true with benchmarks performed by vendors who are seeking to show favorable results for their own solutions. In this article, I review important information to look for and red flags to avoid.

There are three common benchmarks for enterprise storage systems: SPC-1, SPC-2, and SPECsfs. There are also popular application-specific tests that are often used as benchmarks, such as ESRP (for Microsoft® Exchange Server). Let’s take a deeper look into each one.

SPC-1 and SPC-2
First released in 2002 (for SPC-1) and 2006 (for SPC-2) by the Storage Performance Council (SPC), SPC-1 and SPC-2 are used by storage vendors to test the performance of their block-based (for example, Fibre Channel) storage subsystems. Vendors can choose to test and publish their system performance against SPC-1, SPC-2, or both.

  • Typical workload: SPC-1 simulates random block-access workloads, and SPC-2 simulates sequential block-access workloads.
  • Typical applications: Random block-access workloads are common to OLTP systems, database systems, and mail server applications. Sequential block-access workloads include scientific computing, large-scale financial processing, large database queries, and video-streaming environments.
  • Performance throughput measurement: SPC-1’s most common metric is IOPS, or the maximum-sustained input/output operations per second. In contrast, SPC-2’s most common metric is MBPS, or the maximum-sustained throughput in megabytes per second.

When interpreting SPC-1 or SPC-2 performance results, here are a few tips to keep in mind:

  • High IOPS results can make great headlines, but they do not provide a complete picture. When comparing SPC data, you must also look closely at latency, which is equally important.
  • Some vendors pump up their SPC results by using lots of parallel disk drives and mirroring them. This also pumps up the price you will pay to achieve similar results.
  • SPC includes a tested configuration price, but some of the prices are heavily discounted, and others represent the vendor’s list price. When comparing costs, make sure to factor in any expected discounts.

SPECsfs
First released as SPECsfs in 1997 by the Standard Performance Evaluation Corporation (SPEC), the latest version of this benchmark is known as SPECsfs2008. This benchmark uses load generators to evaluate the speed and request-handling capabilities of file-based storage subsystems. This includes network-attached storage (NAS) subsystems.

  • Typical workload: SPECsfs simulates file-based workloads that use the NFS or CIFS protocol.
  • Typical applications: Common file-based workloads include file sharing and Web server applications. NAS system workloads for database and virtual server environments are also increasingly being used.
  • Performance metric used: The most common SPEC metrics referenced are NFS and CIFS throughput. These are measured in ops/sec, or ops per second.

When interpreting SPECsfs performance results, keep the following tips in mind:

  • As with SPC, don’t focus just on throughput. Also examine latencies.
  • As with SPC, some vendors may use lots of short-stroked and mirrored disks to boost their numbers. Always look at the drive count when comparing vendors.
  • Because system prices are not included, apples-to-apples comparisons can be difficult to make with SPEC. Ask your vendors to provide system pricing for their published test results.

ESRP-Storage for Microsoft Exchange Server
The Exchange Solution Reviewed Program—Storage (ESRP) is a simulated Microsoft Exchange Server environment with a common framework that is defined and administered by Microsoft. Similar to SPC and SPEC, it allows vendors to configure storage systems and publish peer-reviewed results. Unlike SPC and SPEC results, however, ESRP results are not considered true benchmarks; rather, they are a vendor’s documented best practices for Microsoft Exchange. In spite of this, interesting comparisons can be made.

  • Typical workload: ESRP uses Jetstress, a synthetic workload generator for Microsoft Exchange Server 2013 environments.
  • Typical application: Jetstress simulates common Microsoft Exchange database and log file workloads.
  • Performance metric used: The common ESRP metrics used are server database writes and reads per second.

When interpreting ESRP performance results, here are a few tips:

  • Because ESRP does not specify a predefined number of mailboxes, mailbox sizes, or mailbox servers, database reads and writes vary greatly between vendor results and do not provide a particularly useful comparison.
  • The most telling statistic in ESRP is the amount of time it takes to recover mailboxes (listed as the transaction log recovery time). This recovery occurs while the storage system is busy processing all other mailboxes. By looking at the log recovery times, you can get a good comparison of the overall I/O processing ability of the storage system.
  • Another good comparison is database backup throughput (listed as database read-only performance). This metric shows how fast the storage system can perform a backup while processing mailboxes. Take a look at the backup throughput reported by each vendor, and you will get a good understanding of I/O processing ability under load.

Free Download

Database market overview: new vendors and innovative solutions for cloud, in-memory and predictive analytics.

Learn More

Real-World Testing Versus Synthetic IT Benchmarking
When real-world testing in your own environment is not practical, synthetic performance benchmarks can be used as a substitute to simulate real-world conditions. All of the tests discussed in this article use synthetic workload generators. When examining benchmark test results, keep in mind that the best-performing storage systems will generally have the best balance of high throughput, low latency, and cost/performance. Although some may question the use of synthetic data to measure storage system performance, when interpreted properly, these tests can offer valuable insights.

Larry Freeman, Senior Technologist at NetApp
A frequent speaker and author, Larry’s current role at NetApp is educating IT professionals on the latest trends, techniques, and best practices in data storage technology. He authored the book Evolution of the Storage Brain and hosts the popular blog About Data Storage.