Pump up array performance


This article can also be found in the Premium Editorial Download "Storage magazine: Boosting data storage array performance."

Download it now to read this article plus other related content.

A few subsystems let you specify how much cache to devote to write activity vs. read activity. These settings aren't necessarily intuitive, but start with your overall subsystem's I/O workload read-to-write ratio and use the read percent vs. the write percent as a starting point. A better approach for most workloads may be to let the subsystem handle this dynamically, as write activity can be highly time dependent (normal work vs. backup activity).

As discussed previously, LUNs may be mapped to controllers and as you monitor your I/O workload, there may be some controller bottlenecks. Once you discover which LUNs are causing the problems, move them to the alternate controller to balance the workload. Some subsystems do this automatically.

In addition to write mirroring, there are other data availability, recovery and subsystem feature issues that impact performance. Many subsystems offer some form of copy-on-write technology that copies a block when it's written to retain replication. This may cause every write to invoke multiple I/Os, so using this feature will impact write performance.

Asynchronous remote replication may consume cache on high-end subsystems and back-end bandwidth on midrange subsystems. With high-end subsystems, write data is typically retained in the cache until it's copied to the remote location. For midrange systems, data is quickly flushed from the cache and then read back only as it's being copied to remote subsystems. So for midrange

Requires Free Membership to View

systems, remote replication doesn't consume lots of cache, but it does produce additional back-end I/O activity. In either case, anything that consumes cache or bandwidth may impact overall subsystem performance.

Using synchronous vs. asynchronous remote replication can cause additional performance hits. For synchronous mirroring, the original write I/O operation is held up until the data is copied to the remote location. Depending on distance and other network factors, this can take a considerable amount of time for each write operation. For asynchronous mirrors, the original I/O isn't held up.

As discussed, some subsystems mitigate the problem of "hot" LUNs automatically or via x+0 RAID group levels. If your subsystem doesn't do this or you haven't used x+0 striping, you may wish to monitor LUN activity to see if there's a particular subset driving a majority of your I/O. If that's the case, splitting the "hot" subset of LUNs across controllers and multiple RAID groups often yields better performance.

Some applications are cache friendly (highly sequential) and some aren't (highly random). When run together on the same subsystem, these applications may have an adverse effect on one another and slow performance. Some subsystems can be partitioned to isolate counterproductive workload combinations. The partitioning splits one physical subsystem into two or more logical subsystems that can be dedicated to support a specific workload. Some subsystems do this by dedicating a portion of cache to a set of LUNs. Others partition by splitting up the entire subsystem--cache, processors and data paths. Yet another method is to split the workload across multiple subsystems.

In some cases, the fabric in front of your subsystem may be a performance bottleneck. For heavy throughput workloads, make sure there are enough FC pipes between the host(s) and the subsystem to support the workload. And remember that HBA parameters need to be in sync with your workload. If you have a high-throughput workload, its HBA transfer size should match or exceed the LUN's segment size. If the HBA transfer size is below the LUN's segment size, the workload won't perform as well.

There are many ways to improve the performance of storage subsystems. Of course, some subsystems may offer more alternatives than presented here, and some array controllers automatically optimize some aspects of subsystem performance such as how cache is configured. Sophisticated caching algorithms have existed in the mainframe arena for many years and for at least a decade for open systems. The bottom line for performance tuning is knowing your application requirements.

This was first published in January 2006

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: