A comprehensive revamping of the storage environment at a major government agency shows how a tiered storage design can help meet operational expectations without busting the budget.
By Herb Ferguson
Vendors work very hard to make the choice of a multitier, enterprise-class storage system an easy one for you. But in the real world, it's not so easy. A multitier, enterprise-class system needs a high level of scalability, and its different tiers need to serve the needs of various applications and databases. It's a substantial long-term investment, and it takes exhaustive planning and research to choose the right one. Perhaps a more fundamental question than which vendor's products to buy is whether to take the integrated, single-vendor approach or to build a system around the components that are most critical to your environment.
In March 2007, InfoPro Corp. was asked to guide a large government agency through such a storage system purchasing decision -- upgrading from entry- and workgroup-level storage to an enterprise-class storage subsystem with a much higher capacity and the ability to scale beyond 1 petabyte (1 PB). The task was a tall one, given customer requirements and budgetary constraints.
The agency's existing environment was quite complex. There were numerous networks and approximately 75 servers (90% Sun Microsystems Inc. hardware) running Solaris, Linux and Windows operating systems with a wide range of business applications and databases, from product lifecycle management to document management apps. The environment was separated into loosely coupled sections that corresponded to the customer's business functions, for instance, production, staging and development. Each section had its own set of server and storage constraints; the production section required the highest uptime availability, whereas other sections had less stringent requirements. As for existing storage equipment, there were seven direct-attached SCSI- and Fibre Channel (FC)-based storage arrays from different vendors, each 1 TB to 3 TB in capacity, for a total capacity of 8 TB to 14 TB; there were also two Sun StorageTek L20 tape backup units.
In the year before InfoPro was called in on the project, the agency's user accounts and storage utilization rates were increasing at an alarming rate -- utilization went from 3.9 TB in January 2006 to 12 TB in May 2007, threatening to exceed available workgroup storage by the end of the year. There was an urgent need to get the upgrade completed as soon as possible.
There was also considerable pressure to select the right system for the agency's particular needs. Choosing the wrong one can make storage management a living hell and lead to project failures. To make the right choice, an in-depth investigation of the agency's requirements was needed, along with an analysis of available products, features and costs.
In setting the requirements of the new storage system, InfoPro determined that the agency would need at least 110 TB initially, as well as the ability to expand beyond 1 PB to handle future needs. The 110 TB would consist of 10 TB of high-speed storage primarily for the database and 100 TB of medium-speed storage for other functions. In addition to the basic sizing requirements, there was a long list of other "wants":
Then it came time to identify storage tiers, requirements and features.
InfoPro determined that four tiers were needed (see "Storage tier configurations," below). The first tier would include 10,000 rpm and 15,000 rpm FC drives, with support for virtualization of other storage tiers, image copies, remote replication and LUN/volume management. The second tier would include medium-speed SAS or SATA II drives and storage, with support for various RAID levels and LUN/volume management. The third tier would include slow-speed, IP-based NAS with support for a failover NAS cluster, as well as NFS, CIFS and iSCSI protocols. The fourth tier would include disk- and tape-based backup with support for disaster recovery and vaulting. All of the tiers would need to be accessible to all of the environment's servers and applications and have built-in redundancy.
At that point it was decided that a loosely coupled approach made more sense than buying an integrated system from a single vendor. With a loosely coupled system, the tiers can be upgraded individually; it's also cheaper, despite opinion to the contrary, and eliminates vendor lock-in. Taking this approach allowed InfoPro to concentrate on the more critical top two tiers, leaving the NAS and backup/restore tiers for a later step in the selection process.
The next step was to survey the range of available midrange/enterprise-class storage products on the market. That survey produced a 30-page document detailing the options. But rather than spending a lot of energy on the wide range of products, InfoPro decided to stay with the storage marketplace's top four at the time: EMC Corp., Hewlett-Packard (HP) Co., IBM Corp. and Sun. But of those four, only storage hardware from EMC and, of course, Sun would be certified by Sun to be compatible with the agency's heavily installed base of Sun server hardware. By choosing a non-Sun-certified storage system, the agency could end up with voided warranties and be caught in the middle of vendor disputes. That certification concern became key as the project progressed.
The tier 2 products were considered first, before tier 1. Not all of the vendors' midrange storage products fit the bill for the project's tier 2 requirements. But EMC's Symmetrix DMX series products, HP's StorageWorks Enterprise Virtual Arrays (EVAs) and Sun's StorageTek 6540 arrays had excellent support for tier 2 requirements.
The enterprise-level products InfoPro looked at were "best of breed" in March 2007. Three products -- the EMC DMX-3, HP StorageWorks XP12000 disk array and Sun StorageTek 9990V System -- could support a 1 PB-plus storage subsystem. (The HP and Sun systems are rebranded Hitachi Data Systems Universal Storage Platform [USP] 1100 units.) The IBM DS series product didn't make the grade as it couldn't meet the 1 PB-plus native requirement, and the EMC DMX-3 supported expansion beyond 1 PB only with its own disk array units. Since the HP and Sun systems were essentially the same product (from Hitachi), it didn't make sense to buy HP and risk installing a non-Sun-certified system. But throwing HP out of the running didn't mean that a decision had been made. Because Sun certifies EMC equipment, InfoPro decided to lower the 1 PB requirement and compare the Sun system with the EMC system.
EMC's DMX-3 vs. Sun's StorageTek 9990V
EMC and Sun take different approaches to enterprise storage implementation. EMC's DMX-3 is more network-centric, while Sun's 9990V has a built-in controller and firmware. They also differ in I/O subsystem approach: The DMX has a point-to-point bus with a direct matrix architecture, while the 9990V has a crossbar switch with a built-in controller. (See "Comparison of Sun 9990V and EMC DMX-3," below, for more differences.)
The 9990V received points for its flexibility and management advantages. Because it can use lower cost, tier 2 storage and manage it as tier 1 storage via a single Web interface, it offered the agency greater flexibility for purchasing and configuring disk. The system would also allow creation of internal and external (tier 1 and tier 2) RAIDs and LUNs of varying sizes and levels without configuration BIN files. Finally, because the agency's server hardware was already primarily Sun equipment, there would be a big advantage in using it for storage hardware as there would be no compatibility disputes or certification issues to be concerned about.
Given these factors, Sun won the agency's business, not only for its tier 1 and tier 2 needs, but for tier 3 and tier 4 as well. Here's what the agency decided on (see "Final tiered storage system design," below):
The central component of the storage subsystem is the Sun 9990V, which performs all tier 1 functions via the Storage Navigator Web interface; virtualizes tier 2 storage for servers; and handles the NAS cluster, tier 4 backup storage, as well as LUN management for all storage tiers. The system -- which currently has 48 300 GB, 10,000 rpm FC drives amounting to 14 TB of tier 1 storage -- has performed exceptionally out of the box, with no tuning. The only anomaly occurred during a firmware upgrade to the 9990V, pointing to a configuration problem with the 6540; that problem was resolved without system interruption.
In its role as the tier 2 system, the Sun StorageTek 6540 provides more than 100 TB of 3 Gbps SATA II drives virtualized by the 9990V as external storage. There are eight 4 Gbps FC connections between the 6540 and the 9990V, balanced based on I/O loads between the two 6540 controllers. Array and LUN slicing and dicing is handled interactively without hindering performance on the other tiers and servers. The system has performed admirably since it was installed.
At tier 3, two Sun StorageTek 5320 NAS Gateway units are linked to the 9990V through eight 2 Gbps FC connections and to project servers via multiple 1 Gbps network connections managed by Ethernet switches. The system supports all three protocols the agency uses: NFS, iSCSI and CIFS. LUNs presented by the 9990V are sliced into local volumes that reside on a proprietary Sun StorageTek file system. The system has passed the tests InfoPro has put it through: failover testing and interactive expansion of the NAS cluster volumes shared via NFS to networked servers.
Finally, at tier 4, the Sun StorageTek SL500 tape library has four LTO-3 tape drives, 150 tapes, a T2000 backup server and Symantec Veritas software. This tier handles disk and tape backup, and vaulting and restoration of application and database data. Each LTO-3 drive can read and write at a sustained 80 MBps individually, with an aggregate throughput of 320 MBps. The system is modular and can be expanded in increments of 150 tapes with an additional four drives. It's currently backing up 80 TB of data per month.
Putting the pieces together
Once the components were chosen, networking details needed to be worked out, and the guiding principle was to provide connections in the right places so that client communication wouldn't be impacted by storage system operations. In the current configuration, the storage subsystems' four tiers are connected directly to one another, separate from the SAN and client FC connectivity. The tier 4 T2000 backup server has direct network connections to the tier 3 NAS Gateway cluster servers to enable backups to be done without slowing down the rest of the network. And storage, application and database expansion can be done without affecting client and storage subsystems.
The separate storage and client networks have been implemented with low-cost switch and virtual local-area network (VLAN) technologies, eliminating network contention, isolating traffic and increasing security. The backup server has its own direct FC and Ethernet connectivity, so backup and restores can happen around the clock. Administration of all network devices can be done via either the backup server or remotely from the administrator's desktop. And all FC connections are auto-sensed to 1 Gbps, 2 Gbps or 4 Gbps, except for the LTO-3 tape drives, which are set at 2 Gbps.
It's worth noting that software and firmware upgrades can be performed with little to no impact on other components. Firmware upgrades to the 9990V, for example, are done interactively by switching I/O and/or LUNs and connections from front to back; upgrades to the 5320 NAS Appliance are handled in a similar way. Firmware upgrades to the 6540 are also done in place, though I/O performance degrades during the upgrade. Upgrades to the T2000 backup server are done during non-backup cycles to avoid impacting the backup schedule.
With the completed storage system upgrade in production for almost two years, the agency is able to assess how closely the initial plans mapped to actual use. There are now approximately 15,000 user accounts in the system, with daily user access peaking at around 10,000. Storage utilization is at approximately 40 TB. The agency is considering implementing thin provisioning techniques to cut down on future storage needs; even so, it's expected that the system will use about 500 TB in the next two to three years and more than 1 PB in five to seven years.
A loosely coupled system like the one the agency implemented brings with it a lot of flexibility and room for growth. But there are tradeoffs: A loosely coupled, multitiered system is inherently more complex than an integrated one from a single vendor, uses multiple management interfaces, and carries compatibility and certification path concerns. In the case of the agency that InfoPro worked with, the upfront research and engineering work made it clear that a loosely coupled system was the right choice. But each IT environment is different; a proper decision process should include not only a comparison of the available systems but discussions with vendor sales and technical reps to make sure that you fully understand their technology offerings. And it's important to dig beyond the sales pitch. Do your own research and, when needed, pull other trusted technical pros into the discussion. Taking the investigation and planning steps of the project very seriously can mean the difference between a system that's universally applauded and meets expectations -- as was the case for the multitier system detailed here -- and one that's quickly outdated or inappropriate for the project it was bought for.
BIO: Herb Ferguson is a senior systems scientist at InfoPro Corp. in Huntsville, Ala., with 25 years of IT architect, engineering, R&D, programming and network experience.