This article can also be found in the Premium Editorial Download "Storage magazine: Continuous data protection (CDP) and the future of backup."
Download it now to read this article plus other related content.
Do-it-yourself storage benchmarks: The primary benchmarking tools in this category are Iometer, IOzone and NetBench. Unlike industry-standard benchmarks, these tools aren't governed by a standards body, and there are no rules attached to how tests are conducted and published. "With do-it-yourself tools, you can compare two configurations without having to trust the published results from vendors' tests," says Brian Garrett, technical director of the ESG Lab at Enterprise Strategy Group (ESG) in Milford, MA.
Do-it-yourself benchmarking tools are some of the primary tools storage vendors and end users use to gauge performance. Contrary to industry-standard benchmarks, their workloads are usually highly configurable, which enables measuring very specific (and a wide range of) IO patterns. "Tests can be conducted at the engineering level to specifically characterize and compare two storage subsystems or configurations, or to approximate applications at a rudimentary level," explains Garrett.
|How cache affects benchmarking|
Storage system performance depends on several key components: the number and type of storage controllers, the number of disk drives, the RAID level and how drives are striped, the number of front-end and back-end ports, available bandwidth, and the size of the available cache and cache options.
While most of these dependencies are hardwired, cache is the big variable and the nature of the workload has a great impact on benchmark results. Generally, accessing data in cache is significantly faster than accessing data on a mechanical device, and the larger the cache the better performance will be. Furthermore, the performance impact of a cache varies for different IO types. Sequential reads are very cache friendly, especially if the storage system supports caching algorithms with read ahead or pre-fetch capabilities that put data into memory from disk before an application requests it.
Random data access is less cache friendly because it's impossible to predict what data will be accessed next. But even for random access, cache will yield a performance benefit in cases where recently accessed data is accessed more than once; for the first data access, data has to be fetched from disk into cache, but for subsequent access of the same data, it can be read directly from cache. The longer data can be kept in cache, the higher the probability of a cache hit, which greatly depends on the size of the cache. Some caching algorithms monitor the frequency with which data is accessed and keep more frequently accessed data longer in cache, greatly improving the impact of cache for random data access.
Cache will also improve the performance of write operations, as the data is written directly to cache and the caching algorithm will determine the best time to write the data to disk (destaging). The benefit of cache will diminish if the cache gets filled up, so the size of the available cache greatly impacts the frequency of cache overflows.
From a benchmarking perspective, it's crucial to understand to what extent cache was utilized. A system with an abundance of cache will yield completely different results depending on the cache-friendliness of the workload. A benchmarking workload should have a healthy mixture of reads and writes that's somewhat representative of real-world workloads. "Our experience is that a balance of 70% reads and 30% writes is fairly common," says Tony Asaro, senior analyst at Enterprise Strategy Group, Milford, MA.
This was first published in October 2007