Published: 14 Jul 2003
Virtual storage area networks (VSANs) are the latest tool available to storage administrators who need better ways of managing the equipment attached to increasingly massive storage arrays. At Australia's Deakin University, VSAN technology has become a major component of its ongoing effort to retire direct-attached storage (DAS) and centralize the school's immense data requirements onto a single SAN.
Deakin's largest campus is located at Burwood, in the eastern suburbs of Melbourne. Melbourne is a city of 3.5 million on Australia's southeast coast; it sits on the north shore of 25-mile-wide Port Phillip Bay. Approximately 11,000 students attend the Burwood campus and a smaller site in nearby Toorak. Other students attend smaller satellite campuses that are separated from the main campus by up to 200 miles; more than 12,500 other students are pursuing Deakin degrees via the Web. All told, Deakin served more than 30,000 students last year.
Replicating over water
Supporting such a large student body is no small task, particularly when those students are distributed over many campuses hundreds of miles apart. Historically, meeting the needs of those students has required a significant investment in DAS storage: Some 35TB of data was spread across 53 Sun Microsystems Sun Fire 12K, E6500, E4000 and E3500 servers distributed across the university's campuses.
For three years, the primary file server had been an eight-CPU Sun E3500 server, with 8GB of RAM and 10TB of attached storage in 20 Sun A5xxx series disk arrays. This server was typically supporting anywhere from 1,000 to 1,200 concurrent administrative, staff and student users. The E3500 was located in the large data center in the city of Waterfront, while 12 more Sun A5xxx arrays provided an additional 6.5TB of disk space at the four other campuses.
Deakin's IT team faced increasing pressure to accommodate rapid growth in demand for storage space, particularly since a recent desktop standardization project put standard operating environments on more than 4,000 Deakin desktops. Although users can store data on their local hard drives, Deakin uses Novadigm's Radia desktop management tool to ensure files are also stored on the centralized storage. With an average of more than 100MB of data per networked PC, that project alone increased storage requirements by 4TB.
Data storage needs have also skyrocketed in recent years with the rapid uptake of online learning--which requires storage of multimedia course materials for students in more than 4,500 courses. To store these materials, Deakin relies on a 1.5TB Oracle database supporting Callista, a Deakin-developed student management system recently purchased by Oracle for worldwide distribution. Callista runs in a logically partitioned domain on Deakin's Sun Fire 12K.
In the past, Deakin has delivered these applications to remote campuses from clusters of servers located in its Waterfront and Burwood data centers. Servers at those locations were backed up to tape, then the backed-up data was replicated to DAS at the university's secondary data center in Burwood and vice versa.
Because WAN bandwidth is expensive outside of Australia's largest cities, Deakin links its campuses using a private microwave network that pushes 300 Mb/s of bandwidth from the Geelong sites to Burwood, spanning the waters of Port Phillip Bay. Approximately 5,000 students are spread across campuses at Waurn Ponds and Waterfront, suburbs of the satellite city of Geelong.
That network is the lifeblood of the university, shuttling mountains of application data, voice over IP (VoIP) and backup traffic between sites. With around 50Mb/s of the microwave bandwidth typically left for backup after other applications have taken their share, data was being replicated between Waterfront and Burwood for up to 23 hours a day.
Although this approach worked, it imposed some major inconveniences on Deakin's community. For example, data pertaining to various applications became unavailable when host servers were brought down during regularly scheduled maintenance outages. It also perpetuated the management and consistency problems intrinsic within DAS environments.
"Our strategy up until this point has been DAS storage," says Craig Warren, Deakin's desktop and services manager. "We had all the standard problems associated with DAS: Storage was managed in silos, it wasn't easy to provide short-term storage needs, and even politics was an issue. We were managing it as islands, so we'd clean up one file server and have to move on to the next one, and the next."
As the volume of data the school was managing grew, Deakin investigated hierarchical storage management (HSM) solutions that it thought could help move old and less-used data onto tape. But the solutions were "a bit wanting," says Warren, because they weren't particularly efficient at delivering small files to users quickly. Another way of getting data off the system--in which students choose the files they want through a Web page and those files are burned and delivered via CD before being deleted from the server--was expected to be a moderate success, but it was nowhere near successful enough to counter the growth in demand for storage.
It soon became clear that the best way for the university to expand its storage strategy was to consolidate its data from its distributed servers onto a single, scalable storage area network (SAN).
Australia's biggest SAN
The impetus behind the SAN migration was to provide a centralized, consistent repository for all kinds of administrative, research and everyday data. Backup would be easier because Deakin's terabytes of data would be backed up on the SAN, rather than being constantly distributed around the university network. And thanks to logical partitioning of the SAN, it would be possible to accommodate the widely varying requirements of the university's many communities of interest.
Last November, Warren's department began evaluating the technology that would make up its SAN. It quickly settled on an IBM TotalStorage Enterprise Storage Server (Shark) 800 Turbo, which offers a capacity of 55.9TB, but was installed with just 30TB to start. Also to be connected were two SuperDLT-based Quantum tape libraries--a 500-cartridge P7000 at the Waterfront facility, and a 250-cartridge P3000 unit in Burwood.
|SAN switching for the future|
Such a dramatic increase in storage capacity doesn't come cheap, but Deakin's IT team was able to justify the expenditure to senior managers by pointing out a simple fact: At the rate its storage demand was growing before the SAN upgrade, Deakin's DAS costs would have exceeded the price of a completely new SAN within two years, and the SAN would provide far more storage.
It was also clear that the environment would benefit from having many of Deakin's servers directly attached to the SAN. That meant linking up the Sun systems--as well as approximately 50 Red Hat Linux servers used for load balancing and other sundry infrastructure tasks--to the storage network. That was an expensive proposition using conventional Fibre Channel (FC) connectivity, but that problem was solved when Deakin began investigating options involving IP-based iSCSI.
Because it already had a long-standing relationship with Cisco Systems for its other networking equipment, Deakin preferred to source appropriate iSCSI technology from Cisco, rather than having to establish a new relationship with an FC specialist such as McData or Brocade. Working with Cisco, the Deakin team explained its need for direct server-to-SAN connectivity and had nearly settled on the Cisco SN5428 before it learned of the Cisco MDS 9000, a new family of multilayer directors and fabric switches that combine iSCSI with support for Fibre Channel over IP (FCIP).
"We've been using low-cost Intel-based Linux servers for load balancing," says Warren. "iSCSI was quite attractive to us because we wanted large amounts of storage on these boxes, but we didn't really want to be paying the price of Fibre Channel HBAs [host bus adapters]. So we were looking at a SAN front-ended with some storage network routers. Since we were looking for both an FC switch and [iSCSI] storage network router, we got very interested in the MDS line."
By January, Deakin was putting its new SAN equipment through its paces. Within five weeks, the basic SAN was up and running, using a pair of Cisco MDS 9509 multilayer directors to mediate between its large server environment, IP WAN and FC-attached IBM Shark storage and Quantum tape silos.
Deakin's experiments with iSCSI confirmed that the protocol will play an important role in reducing the cost of server-to-SAN connectivity. Deakin used several Linux servers to run VMware, which manages virtual Windows 2000 and XP sessions that host services such as Microsoft Active Directory. This approach makes it easy to back up the Windows 2000 virtual machine, because doing so only requires copying the relevant VMware .dsk file.
By using iSCSI to attach those Linux servers directly to one or more LUNs on the SAN, Deakin can retain connectivity from its dozens of Linux servers without having to purchase costly HBAs for each machine, as it has done for its high-end Sun servers.
Ultimately, extension of SAN-based services via iSCSI will enable capabilities such as remote booting, flash-copy backups, and continual mirroring of terabytes of data. Diskless machines will run completely over iSCSI, as will file, Web, database, DNS, DHCP and many other types of servers.
"The reason we're using it is cost," says Andrew White, system programmer at Deakin. "It will save us thousands and thousands of dollars [and it works even though] we are a very distributed university. It's amazing how little throughput iSCSI really needs to work."
Better still, this flexibility doesn't impose any performance hits: In tests, iSCSI--running over a Gigabit Ethernet connection--actually performed slightly faster than a direct FC connection (via a QLogic 2200F HBA directly into the SAN). Deakin is also exploring use of FCIP to seamlessly interlink its FC SAN fabrics between campuses over its IP network.
The SAN began serving files in early February, and throughout that month, Deakin's IT staff began migrating data from the individual DAS drives onto the Shark array.
Because most of Deakin's Sun servers already had dual HBAs installed, migration of the data from the DAS to the SAN environment has been simple. Deakin uses the secondary HBA to mirror data from the DAS to the SAN, and then breaks that mirror by disconnecting the HBA from the DAS.
In this way, the team has already migrated more than three-quarters of the data into the new environment, where it's being partitioned using the VSANs. "About 90% of our migration has been done online with hardly anyone noticing," says Warren.
During the data migration, backups are still being run over the network in the conventional way. But once all the DAS data has been migrated, the team will turn its attention to implementing SAN-based backup.
Virtual SAN reality
As well as fulfilling Deakin's need for iSCSI and FC connectivity, the MDS 9509 switches provided what was to become another important capability: support for Cisco's VSAN technology, which has been modeled after the IEEE 802.1q-based virtual LANs (VLANs) that were long ago standardized across conventional IP networks.
VLANs work by using packet tagging to assign each data packet to membership in one or more VLANs. That way, IP data can be freely routed across a network, but can only be received by devices subscribed to that particular VLAN.
VSANs take a similar approach with SAN equipment, adding a higher level of granularity to FC port switching so that FC fabrics are no longer an all-or-nothing proposition determined by the physical boundaries of the switch. VSANs allow assignment of each individual FC port to one or more VSANs, creating a virtual FC fabric that's used to manage access to the data by various elements of the SAN. As well as partitioning the data, the VSAN also segregates the various FC services that make up the fabric.
For Deakin, VSAN technology offered significant benefits in that it would allow the IT department to provide greater control over access to the SAN for the university's various applications. For example, this meant using VSAN tagging to segregate its storage production network from its storage development network.
The ability to virtually separate these two networks resolved a major problem the university had previously been dealing with in staging changes to its environment. With so much data being managed, that data had to be copied into a development environment that would allow the testing and debugging of applications using real data.
In the case of Callista--which as the main repository for all student-related data is arguably the most important system running at Deakin--this meant copying a 1.5TB database to create a development copy that could be used without fear of corrupting the real database. Callista contains over 20 years' worth of past student data and, although Deakin is only legally obligated to keep student records for seven years, it has opted to store all student data permanently.
Using the VSAN, the need to regularly copy such a large volume of data has been eliminated. Deakin uses the IBM ESS server's FlashCopy snapshot capability to provide nearly instant snapshots of the data, which the VSAN then segregates from the university's live production data.
"Because we have separate development and production VSANs, we can do crazy things in the development VSAN," says Warren. "We put in the IP services module and didn't have it have any effect on the production environment. We're stress testing applications, conducting user acceptance testing, and testing failover cluster configurations without interrupting the real environment. This sort of thing was particularly hard to do in the past, when we typically had large mirrors of the production environment."
The nine-slot 9509 chassis have been installed with three interface cards each, providing a total of 80 FC ports. A dozen ports connect to the IBM Shark, eight ports into the Sun Fire 12K and two ports into each of Deakin's other Sun servers. The Linux servers connect into the SAN using both FC and iSCSI. The control modules are configured redundantly, allowing upgrades and changes to each MDS without having to bring them down. Two lower-capacity MDS units will soon be installed at the Burwood data center to provide further redundancy and segregation of the data environment at that site.
Because it's such an early adopter of VSAN technology, Deakin has worked closely with IBM and Cisco to get the VSANs configured correctly. Yet with a bit of training, Warren says Deakin staff found the process was relatively straightforward, with problems limited to a few issues with the QLogic FC HBAs. Cisco debugging tools--including FC equivalents of Ping and Traceroute--provide diagnostic capabilities that helped trace the path of data across the VSANs and quickly resolve any discrepancies.
Although it's only running two VSANs now, Warren envisions further segregation could become valuable, as individual departments and other functional areas of the university begin to demand their own corners of the SAN. In the short term, however, a more immediate upgrade may be the introduction of a third VSAN that would separate regression testing from user acceptance testing.
The SAN may be the biggest architectural change to the university's storage strategy, but the addition of VSANs is proving to be important in helping Deakin get additional benefits from its SAN investment. When production and testing environments "were in the same fabric, whenever the fabric would reset, every machine would have to log in again," says White. "In a big environment with hundreds of switches, the VSAN will be an absolute lifesaver: We can drop and reboot the 9216 all day long without causing a hiccup on the production VSAN."
Given that the traffic segregation provided by the VSANs has now been proved to work, Warren anticipates SAN management will become easier. The university's growing data store--which he expects will surpass 60TB by next year--can now be managed in logical partitions determined by use, not just by which switch the devices happen to be plugged into.