The case for high-end arrays

The gap between midrange and high-end storage arrays has narrowed, enough so that the decision of which storage array to buy is less of a technical decision and more of a business one.

The time to buy another high-end storage array has arrived. A cursory review of arrays reveals that each one offers all of the features your organization wants or needs.

They offer: Fibre Channel (FC) with an iSCSI road map, compatibility with the server operating systems in your environment, 24 x 7 technical support, three-year warranties, excellent performance, five-nines of uptime and redundant internal hardware. Each is certified to work with all major FC switch vendors and is supported by the service and support arms of major storage vendors and provides relatively easy-to-use management software.

But there's this one tiny problem--the price tag. After all, why should you pay extra for a monolithic high-end storage array when an array nominally classified as midrange meets your requirements and cost less?

Every day, the line between high-end and midrange storage arrays gets progressively blurrier. Just a few years ago, upscale features such as high performance, availability, reliability and capacity were generally the domain of the monolithic or high-end storage arrays. Now they are appearing on a rising number of modular or midrange arrays.

What's still generally true is that high-end arrays combine more of these features and necessitate fewer trade-offs than midrange ones. So there are still solid reasons why you should pay a premium for high-end storage arrays. The trick is to be precise about exactly what your needs are and exactly which products have the features to meet those needs.

High-end storage arrays
Click here for a look at the differences between various high-end storage arrays.

Defining high-end
Even a leading analyst firm like Gartner Inc. dances around the topic of what defines a high-end storage array. Gartner's Magic Quadrant of players for the first half of 2003 renames the monolithic and modular classes of storage as high-end and midrange storage to reflect the shift that's occurring between these classes of arrays. Even though Gartner's reports continue to break out players in the high-end and midrange storage array market, they now include the word "enterprise" in their description for the storage arrays in both of these spaces.

Looking to the vendors for clarification on what constitutes a high-end array doesn't help much either. Some, like Claus Mikkelsen, Hitachi Data Systems' (HDS) senior director of storage applications, take a limited view, asserting that the enterprise space is defined by EMC Corp.'s Symmetrix, HDS' Lightning and IBM Corp.'s Shark. Others, like Chris Bennett, Network Appliance Inc.'s director of platforms and storage, take a broader view, saying that a company's choice of a high-end storage array sets a tone for the level of features and functionality expected in future purchases and reflects the type of storage infrastructure they want to build their organization around.

This obfuscation of what constitutes a high-end storage array creates a new challenge for users. With the growing number of specialized storage arrays on the market, organizations can no longer look to a default industry definition or point to some vendor's storage array and expect that definition or array to be the standard by which they measure products. You now need to define for yourself which high-end storage arrays will meet the various requirements of your business.

It's crucial to get your company's definition of high-end storage arrays aligned with your business environment. An incorrect or incomplete alignment can mean the wrong selection of an array and in a worst-case scenario, result in unexpected downtime and loss of data.

Coming up with the checklist to align your environment with today's high-end storage arrays may require incorporating features traditionally found in monolithic storage arrays such as performance, availability, reliability, connectivity and capacity. You will likely need to expand this list to include the feature of price point along with new ones such as serviceability and manageability.

All performance isn't the same
Even though the feature of performance may be thought of as a commodity in storage arrays, Hitachi Data Systems' (HDS) Claus Mikkelsen, senior director of storage applications, sees a point of differentiation. He says that while pure raw performance is about equivalent in midrange and high-end storage arrays, providers of the traditional high-end monolithic storage arrays distinguish themselves in terms of how they handle and balance the same workload.

HDS achieves this handling and balancing of work loads on their 9900 series Lightning boxes through the use of its CruiseControl software. It continuously monitors its 9900 series arrays and can dynamically move data between disks while the data is being accessed. While it monitors and analyzes performance, it can either automatically make tuning adjustments or generate reports with tuning recommendations for the administrators to review. Furthermore, it achieves this without the deployment of agents on servers using the 9900 series storage on the backend.

EMC uses a similar tool in their environment called Workload Analyzer and sets itself apart from its competitors in two ways. While this tool collects, graphs, analyzes and archives performance data on EMC supplied storage, it works not only with all of EMC's high-end Symmetrix storage arrays, but also with their midtier Clariion offering. However, getting this functionality requires the deployment of agents on servers attached to their storage arrays.

IBM offers its ESS Expert, which also enables administrators to monitor and report on the performance statistics of their Enterprise Storage Servers (ESS) Sharks anywhere in the enterprise storing the collected information in a relational database. However, this tool currently lacks the ability to make any necessary changes dynamically and requires the administrators to make decisions about volume placement and data movement based on the information the ESS Expert gathers.

Performance always shows up as a check mark when buying a high-end storage array. Typically, high performance is the result of internal architecture and optimal disk drive sizes and speeds. And while you should certainly use these as guides in defining what comprises a high-end storage array in your environment, your mileage will depend on the application for which the storage array is being used.

For instance, applications that are highly sequential in nature--such as mainframe batch jobs--better exploit storage arrays that contain large amounts of cache with caching algorithms that optimize prefetching data. Storage arrays with large cache configurations can partially overcome drives with slower speeds and larger sizes because much of the data can be loaded into cache prior to the application even requesting it.

High-end storage arrays such as EMC's DMX 3000, IBM's 2105 Model 800 (Shark) and HDS' 9980 V (Lightning) that follow the traditional monolithic model make the most sense for these applications. They minimize the I/O requests to disk the storage array has to make due to the large amounts of cache they support and the caching algorithms they use. Each of these storage arrays support at least 64GB of cache; the DMX 2000 and 3000 models each support up to 128GB of cache.

Conversely, many of today's random access, read-intensive relational database applications negate some of the benefits of a large amount of cache in a high-end storage array. Because of the random nature of the queries and the fact that the caching algorithms can't easily predict the appropriate data to load into memory, I/O requests must bypass memory and read data directly from the disk.

Here's where a midrange storage array's internal architecture and disk drives may equal or even outperform a traditional architecture. Fremont, CA-based 3PAR's InServ S800 storage array uses a backplane with mesh architecture that it says supercedes both internal bus and switch architectures with up to 28GB/s of internal bandwidth. The InServ S800's controller nodes also separate the processing of controller commands from the data movement thereby removing another possible performance bottleneck that may exist in today's systems. These two features combine with FC disk drives with faster rotational speeds to potentially offer equal or better performance for today's applications than their monolithic counterparts.

Arrays following the traditional monolithic models shouldn't be disregarded for these sorts of applications, but they should no longer be thought of as the only option.

Availability and reliability
Availability and reliability are not just must-haves on high-end storage arrays--they are assumed to be there. Chuck Hollis, EMC's vice president of storage platforms marketing, points out that today's mission-critical environments are "always on," so everything from routine configuration changes to maintenance code upgrades must be nondisruptive. With the high cost of downtime in these environments, the cost of high-end storage arrays is more than justified by the savings that they generate by avoiding the possibility of any outages, planned or unplanned.

3PAR's president and CEO, David Scott, contends that there's currently very little difference between storage arrays classified as midrange and high-end in the areas of reliability and availability. Both of these classes of storage arrays generally use the same highly available and reliable components purchased from essentially the same set of underlying hardware suppliers. The differences that do exist in availability and reliability on the different storage arrays frequently depend on how each storage array vendor's puts that hardware together in their array, how well they test it in their labs and how their proprietary software works with it.

One factor that does influence the availability and reliability of storage arrays is the RAID configuration of the disk drives within the storage array. The two most common deployments that preserve information if one of the disk drives fails are RAID 1 (mirroring the data on two disk drives) and RAID 5 (striping the information across five disk drives).

All high-end storage vendors offer at least one--if not both--of these configurations. Some such as HDS' 9980 V and IBM's 2105 Model 800 now offer users the ability to mix and match various RAID configurations within their storage arrays to meet the needs of specific applications. In addition, some arrays also offer advanced RAID functions such as RAID 10 that offer spare drives that will immediately replace any disk drive, should it fail as part of the primary RAID configuration.

But one factor in evaluating availability that rarely gets the attention it deserves is how code updates are applied on the storage arrays themselves, a task you will probably confront once or twice a year. Code upgrades can be needed for any number of reasons, from routine maintenance to fixing a known issue to gaining some additional functionality. Keep in mind that all code upgrades aren't created equal.

In a poorly laid out or misunderstood environment, they can be disruptive and will likely require outages lasting for several hours or longer on hosts connected to a single port on a storage array. This minor task can wreak havoc, especially where multiple hosts with different service level agreements and maintenance windows connect into the same high-end storage array. Extensive forethought and planning is required to assess existing storage network connections and ascertain if specific system outages will occur when the storage array code upgrade takes place.

The software gap is narrowing, too
One of the best features of high-end arrays is software. Features such as the ability to synchronously or asynchronously mirror storage between arrays, create point-in-time copies and do remote copies between large distances were what separated monolithic arrays from their competitors.

Yet a number of competitors in the midrange market now offer similar functionality which has helped to erode the original value proposition of these offerings. 3PAR, in Fremont, CA, offers their InForm Operating System, which includes point-in-time copies with configurable parameters as well as remote copy functionality. Network Appliance's Data ONTAP operating system also offers similar snapshot and mirroring functionalities as well as a snap restore capability that allows a system to revert back to a specified point in time.

In response to this, the traditional providers of monolithic arrays have found new ways to add value in their native software. IBM now offers advanced functions for use in conjunction with their zSeries servers such as priority I/O queuing and multiple allegiance on its ESS Model 800. Priority I/O queuing ensures important applications have priority access to storage resources, while multiple allegiance enables different operating systems to perform multiple, concurrent I/Os to the same logical volume.

EMC offers its Double Checksum product that verifies the integrity of Oracle data before writing it to its Symmetrix storage array and provides automatic notification of corrupt data. It also offers its Database Tuner product that identifies and models solutions to performance problems for Oracle and IBM DB2 UDB databases in Symmetrix environments.

These software extras should be considered if your environment wants to buy storage and is looking to address other issues in the enterprise as well. In cases where two or more vendors come in at about the same price point and measure up about the same in other desired areas, negotiating for features like these as part of the total storage package can help you both meet your immediate storage needs while economically solving some other internal business problems as well.

Connectivity represents one of the primary benchmarks that some organizations still use to differentiate between high-end and midrange storage arrays. Simply put, many storage managers' rule of thumb is "if it supports FICON and/or ESCON connectivity, it's a high-end storage array, and if it only supports FC, it's a midrange storage array."

For shops that require either FICON or ESCON connectivity to the mainframe, six companies--EMC, Hewlett-Packard Co., HDS, IBM, StorageTek and Sun Microsystems Inc.--currently offer that functionality on their traditional monolithic boxes. EMC's new DMX800 breaks some new ground on this front in that they now offer FICON connectivity on a modular array.

Storage arrays also must support FC in some capacity to be considered high-end. Some newer companies such as EqualLogic, in Nashua, NH, hope to gain market share exclusively using an iSCSI interface in its PeerStorage Array 100E, but it seems unlikely that tactic will work at this early stage of the IP storage game. More likely, companies such as Network Appliance Inc.--which recently added 2Gb FC connectivity on their FAS900c series arrays--will have more success in gaining the status of high-end in the minds of users.

With vendors such as Microsoft Corp. and Novell Inc. already offering a standards-compliant iSCSI driver for their respective operating systems, and EMC joining Network Appliance in the fray, it seems that iSCSI will become a requirement sometime in the near future for some high-end array implementations. The two types of applications most likely to need iSCSI initially are low-cost servers that organizations are looking to connect to IP SANs, and those applications that need data stored at a remote site, but need a less costly way to connect the two sites.

The other looming connectivity requirement is 4Gb/s FC. Recently, the Fibre Channel Industry Association extended it from an intracabinet storage device interconnect to switched SAN fabrics, so it will likely find its way into arrays as an option. FC at 4Gb/s will make the most sense for those applications that are performance intensive, but cost-sensitive, where you can't justify upgrading the entire enterprise to meet one applications requirements. New host bus adapters (HBAs) can utilize the existing infrastructure and be backwards compatible with the 1Gb and 2Gb FC protocols.

Capacity comes in as the final feature traditionally associated with monolithic high-end arrays. Many of today's putative midrange arrays now support larger configurations of raw capacity than the latest monolithic arrays. For instance, 3PAR's latest InServ S800 supports up to 2,560 147GB drives with a maximum raw capacity of nearly 380TB, which more than doubles the number of disk drives and raw capacity that the HDS 9980 V will support.

Be careful about interpreting capacity. If your application just calls for large amounts of raw capacity with no special needs for performance, availability, reliability or connectivity, an array that's simply large may be just what you need. But usually, one factor alone rarely holds that much weight in the decision-making process.

With a midrange system configured with 2TB of storage coming in around $40,000 to $60,000 and a similarly configured monolithic array coming in between $120,000 and $200,000, you may wonder why you would pay the premium for a monolithic array.

You can't answer that question without considering the application. For applications that can never afford downtime and where the penalties are huge in terms of lost revenue, why risk the cost to the bottom line and the perception problem you create with the client? In that case, spending 8 cents/MB for a traditional monolithic high-end array is cheap compared to the consequences.

Because of arguments like this, many storage managers forego midrange arrays in favor of more expensive boxes. Jim Tuckwell, IBM's product marketing manager, says IBM has seen about a 50-50 split in their customer environments where price point is cited as the reason for the choice they made. Some organizations gain an economic advantage by standardizing on high-end monolithic storage arrays, while others take advantage of the attractive price point of midrange storage arrays.

This last example illustrates that much more goes into making a decision about which high-end storage array to buy than just one variable. Yet with all of the variances within these first six present, the factors of serviceability and manageability may be the ones that ultimately sway purchasing decisions in the future.

Serviceability and manageability
Serviceability and manageability are the two categories that may be the hardest for an organization to measure quantitatively in comparison to the first six categories, but it's the easiest for users to experience qualitatively. While arguments about who has the best performance will always rage and vendors will leapfrog each other from quarter to quarter in certain key benchmarks, it's when something goes wrong that businesses want someone onsite now.

Here's where an edge goes to some of the major players in the storage arena. EMC, Hewlett-Packard Co., HDS, IBM, StorageTek and Sun each have trained engineers in the field to support almost any kind of problem with their storage arrays that minimize interruption to the enterprise. As HDS' Claus Mikkelsen points out, when a major financial institution calls and reports a problem that involves your storage array, you have to be there, no questions asked.

Those players without an extensive network of trained engineers in the field will likely need to establish some level of relationships with the larger players who do. 3PAR, a new entrant in the field, already has a support agreement with IBM's Global Services. Another new entrant, Dot Hill Systems Corp., has established an OEM agreement with Sun to resell and support their storage arrays. LSI Logic Corp., in Milpitas, CA, has built relationships with both IBM and StorageTek to resell and support their arrays while capitalizing on the support structures of those organizations. Network Appliance also established a relationship with HDS to break into new markets with their products using HDS' existing marketing and support structure.

EMC uses a paired service support model to complement its support in the field. Under this model, EMC places an engineer at the customer site while their customer receives a resource in EMC's lab dedicated to testing customer configurations in their labs. This relationship gives EMC the ability to understand their customer's environment and respond to any issues should they arise. Conversely, it gives their customers an inside track to EMC's labs should they have issues as well as giving them a greater voice in the next generation design of EMC's products and a higher level of support when troubleshooting any issues with their storage arrays.

The final issue of manageability gives a preliminary edge to the existing storage hardware players, but they will have to work hard to keep that edge. Most are heading down a path of open standards on their own arrays and revamping their own software to manage their competitor's storage arrays. Yet while progress is being made in these areas, much work remains to be done. The biggest challenges they face are getting their software to perform advanced functions such as point-in-time copies and performance tuning on storage arrays from their competitors.

The gap between midrange and high-end storage arrays has narrowed, enough so that the decision on which storage array to buy is less of a technical decision and more of a business one. With decreasing price points, more competition and the commoditization of certain features within the arrays themselves, users will need to become more educated about their own environment in order to purchase the storage array that best suits their needs, or pay a premium for their ignorance.

Dig Deeper on Primary storage devices