This article can also be found in the Premium Editorial Download "Storage magazine: Is it time for SAN/NAS convergence?."
Download it now to read this article plus other related content.
|All performance isn't the same|
Performance always shows up as a check mark when buying a high-end storage array. Typically, high performance is the result of internal architecture and optimal disk drive sizes and speeds. And while you should certainly use these as guides in defining what comprises a high-end storage array in your environment, your mileage will depend on the application for which the storage array is being used.
For instance, applications that are highly sequential in nature--such as mainframe batch jobs--better exploit storage arrays that contain large amounts of cache with caching algorithms that optimize prefetching data. Storage arrays with large cache configurations can partially overcome drives with slower speeds and larger sizes because much of the data can be loaded into cache prior to the application even requesting it.
High-end storage arrays such as EMC's DMX 3000, IBM's 2105 Model 800 (Shark) and HDS' 9980 V (Lightning) that follow the traditional monolithic model make the most sense for these applications. They minimize the I/O requests to disk the storage array has to make due to the large amounts of cache they support and the caching algorithms they use. Each of these storage arrays support at least 64GB of cache; the DMX 2000 and 3000 models each support up to 128GB of cache.
Conversely, many of today's random access, read-intensive relational database applications negate some of the benefits of a large amount of cache in a high-end storage array. Because of the random nature of the queries and the fact that the caching algorithms can't easily predict the appropriate data to load into memory, I/O requests must bypass memory and read data directly from the disk.
Here's where a midrange storage array's internal architecture and disk drives may equal or even outperform a traditional architecture. Fremont, CA-based 3PAR's InServ S800 storage array uses a backplane with mesh architecture that it says supercedes both internal bus and switch architectures with up to 28GB/s of internal bandwidth. The InServ S800's controller nodes also separate the processing of controller commands from the data movement thereby removing another possible performance bottleneck that may exist in today's systems. These two features combine with FC disk drives with faster rotational speeds to potentially offer equal or better performance for today's applications than their monolithic counterparts.
Arrays following the traditional monolithic models shouldn't be disregarded for these sorts of applications, but they should no longer be thought of as the only option.
Availability and reliability
Availability and reliability are not just must-haves on high-end storage arrays--they are assumed to be there. Chuck Hollis, EMC's vice president of storage platforms marketing, points out that today's mission-critical environments are "always on," so everything from routine configuration changes to maintenance code upgrades must be nondisruptive. With the high cost of downtime in these environments, the cost of high-end storage arrays is more than justified by the savings that they generate by avoiding the possibility of any outages, planned or unplanned.
3PAR's president and CEO, David Scott, contends that there's currently very little difference between storage arrays classified as midrange and high-end in the areas of reliability and availability. Both of these classes of storage arrays generally use the same highly available and reliable components purchased from essentially the same set of underlying hardware suppliers. The differences that do exist in availability and reliability on the different storage arrays frequently depend on how each storage array vendor's puts that hardware together in their array, how well they test it in their labs and how their proprietary software works with it.
One factor that does influence the availability and reliability of storage arrays is the RAID configuration of the disk drives within the storage array. The two most common deployments that preserve information if one of the disk drives fails are RAID 1 (mirroring the data on two disk drives) and RAID 5 (striping the information across five disk drives).
All high-end storage vendors offer at least one--if not both--of these configurations. Some such as HDS' 9980 V and IBM's 2105 Model 800 now offer users the ability to mix and match various RAID configurations within their storage arrays to meet the needs of specific applications. In addition, some arrays also offer advanced RAID functions such as RAID 10 that offer spare drives that will immediately replace any disk drive, should it fail as part of the primary RAID configuration.
But one factor in evaluating availability that rarely gets the attention it deserves is how code updates are applied on the storage arrays themselves, a task you will probably confront once or twice a year. Code upgrades can be needed for any number of reasons, from routine maintenance to fixing a known issue to gaining some additional functionality. Keep in mind that all code upgrades aren't created equal.
In a poorly laid out or misunderstood environment, they can be disruptive and will likely require outages lasting for several hours or longer on hosts connected to a single port on a storage array. This minor task can wreak havoc, especially where multiple hosts with different service level agreements and maintenance windows connect into the same high-end storage array. Extensive forethought and planning is required to assess existing storage network connections and ascertain if specific system outages will occur when the storage array code upgrade takes place.
This was first published in September 2003