Published: 15 Dec 2003
|Managing storage in SANs|
There's lots of talk about increasing storage utilization, but so far not many companies are doing a stellar job. The reason? Much of the focus on storage area networks (SANs) today revolves around the network management of SANs: LUN masking, zoning and volume discovery. Performing these tasks can take hours--if not days--to get everything right. Toss in a growing environment with different OS, clustered servers, interconnected switches and varying storage array vendors, and the possible networking permutations become quite complex. As a result, only lip service is paid to increasing storage utilization rates.
Some current solutions such as EMC Control Center, CreekPath Systems CreekPath Suite and Veritas SANPoint Control claim to manage storage, but in reality, they manage and mask the existing networking complexity instead of improving storage utilization rates. That's not to minimize what these tools do. They provide valuable SAN infrastructure networking services such as visualization, performance management, device management and some limited storage management capabilities. But they don't create a central, homogenous pool of storage.
This may almost sound like blasphemy because using these tools in conjunction with SAN-attached storage arrays gives the illusion of a central pool of storage. Yet today's SAN implementations only break the physical relationship between the server and the storage array by placing a Fibre Channel (FC) switch between them; they do not break the more important logical relationship.
Fabric-based storage virtualization addresses the shortcoming that exists not in these tools, but in the current SAN design. Virtualization breaks the physical and logical associations between the server and the storage. By breaking this relationship in the network, it in essence creates what is known in the mainframe environment as a volume table of contents (VTOC).
The VTOC serves three important functions: It allows storage to be discovered and cataloged. It can discover this storage with minimal zoning or LUN masking changes and without the requirement for every vendor's APIs or the latest, greatest SNIA standard because all storage is mapped to and owned by the VTOC. Finally, it provides a central management interface from which storage may be provisioned to servers.
While the value of a fabric-based VTOC may be debated in small SANs where there's only one storage array or a small number of servers, the problems it solves and the value it offers to enterprises is indisputable. By solving the current networking problems, the VTOC not only creates a pool of storage from which all OS can obtain storage, but lays the foundation for concepts such as information life cycle management (ILM) to finally take hold and succeed in large open-systems storage environments.
A growing number of storage providers are jumping on the intelligent fabric bandwagon. Just to name a few: Brocade Communications Systems Inc., for example, has placed routing and QoS functions in its switch's OS; DataCore Software Corp., Ft. Lauderdale, FL, and FalconStor Software Inc., Melville, NY, forcefully argue that volume management best belongs in the network.
While placing functions such as routing, QoS, volume management and security into the fabric solves some current problems, they also introduce new ones. Current storage networking issues such as latency, complexity and management won't magically disappear with a more intelligent fabric, but will simply reappear in different forms.
There's little debate that the ability to manage storage in the fabric is generating increased interest. Volume management appears both as a feature integrated into the switch, as well as on a separate appliance. For file level network-attached storage (NAS) environments, this function most often appears as a separate appliance such as Hitachi Data Systems/Network Appliance's NAS Enterprise Gateway that handles file-level traffic for file sharing between different OS. In block-level storage area network (SAN) environments, this utility appears as an appliance--such as DataCore's SANsymphony product--and as an optional component in Brocade, Cisco Systems Inc. or McData Corp. fabrics.
Much of the impetus behind moving the volume management into the fabric in both SAN and NAS environments comes from the complexity in managing storage at either the host or storage array level. While this is less of an issue in environments with only a limited number of hosts (20 or fewer) or storage arrays (two or fewer), in shops with a large number of storage arrays (five or more), the complexity and difficulty of managing storage in this environment grows almost exponentially, not linearly. (See "Managing storage in SANs," on this page.)
Fabric-based volume management simplifies the picture. No longer does a storage administrator have to allocate a portion of the storage on each storage array to individual servers and leave the unallocated storage in unreportable storage pools. Instead, the storage admin maps all of the storage in each array to the fabric-based storage controller which can detect, manage and report on all of the free and allocated storage under its control.
The storage controller now serves as a volume manager that creates volumes of different sizes on any storage array it controls and then present them as LUNs to any server zoned to see it. It can enable advanced storage management features such as moving data between storage arrays from different vendors, create point-in-time snapshots of data and present a report of the enterprise storage within a single management console. It also lays the foundation for a true open-systems information life cycle management (ILM) solution. It does this by creating a common layer that any open-systems platform can communicate with to automatically move data onto different types of storage devices, be they disk or tape.
Of course, a single storage management tool also brings vendor lock-in to a specific network-based virtualization or volume management solution. While the American National Standards Institute (ANSI) formed a committee in June 2003 to propose a fabric application interface standard (FAIS) to minimize or eliminate vendor lock-in, the time frame before it is approved, and then gains user acceptance, is still years away.
Mike Witkowski, CTO of Maxxan Systems Inc., San Jose, CA, says that users have been reluctant to adopt virtualization for fear of being locked into a specific vendor's solution. Each vendor's virtualization product performs snapshot and mirroring tasks differently, and of course, the devil is in the details.
Yet it's for exactly these reasons that Tom Clark, Nishan Systems (now McData) director of technical marketing, says a standard for virtualization may not take hold. As certain functions move higher up the food chain, the less standardization applies, especially at the application level. He says that companies backing up with Veritas NetBackup don't expect to be able to recover data with Legato Networker or vice versa. He thinks that this same concept may carry over into the volume management space as well because each vendor's virtualization implementation will be proprietary for the simple reason that volume management standards will be difficult to define and enforce.
Users looking to implement network storage smarts right now should see if they can accomplish the same tasks on their existing storage arrays. Most midrange and modular storage arrays all ship with a fair amount of storage intelligence and functionality in them. However, for environments looking to consolidate their disparate storage devices into one ubiquitous storage pool, appliances from DataCore, FalconStor, Fujitsu Softek and IBM--along with storage switches from Brocade and Candera--are nearer than people think to creating these central managed storage pools.
The intelligent switch
Companies such as Brocade, Cisco and McData offer switches that include routing and QoS as part of their switch's native OS that help organizations manage the data flow through their Fibre Channel (FC) network. For instance, Brocade's latest version of its OS offers the ability to trunk up to four of its 2Gb ports and present them as one logical 8Gb path between two of their switches. Directors and switches from McData can also accomplish trunking in a similar fashion assuming they use version 5.0 or later of McData's microcode.
For the switch's OS to treat separate physical paths as one logical path is important for two reasons: First, it reduces the number of inter-switch links (ISLs) required to link core and edge switches. Switches that use trunking can use up to four ports to get an effective throughput of up to 8Gb. Switches that lack the trunking function will require more dedicated ports between the switches to distribute the traffic between the switches.
Second, trunking captures meaningful performance statistics. For instance, each time servers log off and back on to a storage network that uses ISLs, but lacks trunking, they are assigned a different ISL to send their network traffic. This round-robin method of assigning data traffic between switches makes historical performance statistics questionable, and could create a situation in which one ISL handles traffic from multiple servers while another link handles little or no traffic at all. The only way for the SAN administrator to fix this is to force all of the servers on the storage network to log off and back onto the SAN. While this would force a redistribution of the data traffic across the ISLs, it could also negatively affect the servers using the network, possibly resulting in failed paths or outages.
|What should users be doing now?|
Storage services. Storage services should take precedence in user environments consisting of heterogeneous operating systems and storage arrays. The good news is that existing players in this space such as DataCore Software, FalconStor Software and Hewlett-Packard Co. already are beyond the 1.0 release levels and new players such as Brocade Communications Systems Inc., Candera Inc., Cisco Systems Inc.,
Fujitsu Softek and IBM Corp., among others, have entered the market providing an increasing number of solutions with new feature sets to test. Users need to take some time to understand the pros and cons of appliance and switch-based solutions. Be cautious about deploying too rapidly and be ready for some short-term pain. However, keep focused on the long-term benefits such as simplified networking and easier storage management because they will definitely outweigh the short-term testing and implementation headaches.
Transport services. Understanding transport services should take priority in environments where users anticipate using different protocols (Fibre Channel, iSCSI) to connect to the storage network, to connect storage area network (SAN) islands or experience throughput problems. Intermixing different protocols on a single switch generates a lukewarm reception from users. FC switches that support different protocols will take on increasing importance in the coming year as iSCSI starts to build momentum. Users need to become more comfortable with storage networks before starting to introduce different protocols.
Throughput problems appear to be taking a back seat right now in the minds of users. 2Gb FC exceeds the requirements for many sites, and with 4Gb and 10Gb FC and 10Gb Ethernet slated for a 2004 appearance, few users seem concerned about current bandwidth limitations other than for creating inter-switch links (ISLs).
Connecting SAN islands gets mixed reviews in the minds of users. Some users seem content to pull out their existing smaller 16- and 32-port switches and replace them with 64-port or larger directors. This move keeps the fabric design and management simple and also helps to avoid, for the time being, any throughput issues that may arise. Others appear willing to connect smaller switches and centralize resources, but face the trade-off of creating more complicated environments.
Security services. Security services remains last in the minds of users. While companies like Decru Inc. and NeoScale Systems Inc. both appliances that can authenticate servers and encrypt traffic on the SAN, most users get peace of mind by placing their servers and storage on a physically separate network running a different protocol behind locked doors. For now, users should only consider security services where there's some question about the legitimacy of the users or servers accessing the SAN or if some chance exists that the data may be intercepted.
Another important characteristic that comes into play on switches is the ability to do QoS. While smaller, high-speed SANs can usually handle today's storage traffic without any performance impacts, as SANs grow and different types of data traffic get introduced into the network, the ability to understand and prioritize the data traffic will grow. That's when switches with OS that support QoS should shine.
For instance, today's storage networks may concurrently support traffic from a Windows e-mail server, a network file server and multiple Unix and Windows database servers, all with varying degrees of importance. Switches that support QoS should be able to inspect and prioritize each data packet flowing through the network for the type of data contained in the packet, performing tasks such as increasing or decreasing the bandwidth available for a specific application.
Here's where the header segment of the FC data packet can help. Several companies, including Veritas, are looking to write to this part of the FC packet so the network can better manage traffic. For instance, if a packet will carry backup traffic, Veritas can adjust its NetBackup code to write that information to the header portion of the FC packet.
Once the header has such information, QoS can intervene during periods of high activity on the network. The QoS inspects the appropriate portions of each FC packet and based on user-defined policies, allocates more bandwidth to applications such as OLTP while throttling back on the bandwidth allocated for backups and routine requests.
Today, storage networks remain relatively secure simply because they usually exist as physically separate, limited-access networks. Those days are coming to an end. With new technologies like iSCSI, the increasing need to back up and replicate data remotely--and the emerging corporate objectives to consolidate and manage storage across different data centers--networks will no longer remain isolated.
As SANs become connected, security risks start to emerge. As occasionally happens with servers, host bus adapters (HBAs) move from one server to another, especially if the server the HBA was originally installed has a PCI bus. Because multiple server hardware and software vendors support the PCI bus type, the card may be salvaged for use in another server, introducing the chance of a security breach.
If the LUN security and zoning associated with the original HBA isn't removed prior to this HBA being installed in another server, the possibility exists that as soon as this HBA is plugged back into the SAN, the HBA and its new server OS may immediately access its old storage. Administrators may also not know that they shouldn't have access to this storage and may try to discover and format the storage, wiping out data.
New switches and specialized storage security appliances can help secure expanding networks. Brocade, for example, has agents that can be placed on servers that do a handshake between the server and their switch OS. If the switch detects that the world wide name (WWN) of the HBA logging onto the network is assigned to a different server, it will prevent that server from logging into the network or accessing any storage.
Appliances from Decru Inc., Redwood City, CA, and NeoScale Systems Inc., Milpitas, CA, go a step further. While they also authenticate servers coming onto the storage network, they actually encrypt the data coming onto the storage network and store the data in an encrypted format on the disk itself. The advantage of this approach is that even if storage is presented to another server, the disk and data can neither be read nor written by the new server--only the server that possesses the original encryption key can access the data.
All of these new fabric-based technologies come with a price. While most eliminate one or more existing pain points of storage management or network connectivity, they create their own set of management or performance problems.
No silver bullet
Introducing routing into the switch can introduce potential problems, especially if routing involves multiple switches. Routing the data traffic from the server to the storage array across multiple hops and ensuring it responds in a manner that meets current SLAs performance requirements can be a tricky venture at best. Spread that server's data across multiple storage arrays connected to different switch ports and it becomes a nightmare to untangle any performance issues that might crop up.
A similar scenario holds true for QoS. The idea of allocating more bandwidth to an application that needs it while throttling the bandwidth back for another application sounds great. Unfortunately, not all environments are so cut and dry. While increasing the bandwidth for an OLTP application may be ideal, there may also be SLAs tied to backups as well. If the backup is not completed within its backup window or fails because it can't get the bandwidth it needs in time, then the whole intent behind QoS fails.
Volume management sounds like another stellar idea whose time has arrived. Yet the jury is still out on whether this technology will dramatically improve a storage administrator's work. While standardizing on one virtualization or volume management platform across the enterprise may appeal to the organization's storage team, it's the guys with the check books who close the deals.
So standardizing on Fujitsu Softek's Storage Virtualization or IBM's SAN Volume Controller may sound great today, but when Cisco shows up on your doorstep offering EMC's or Veritas' Volume Manager natively integrated into their switch for free--plus a discount to throw out all of your company's existing switches--upper management will be hard pressed to ignore that value proposition. Or worse yet, companies may elect to keep everyone's products, switches, storage arrays and multiple virtualization solutions, forcing the storage team to manage the existing storage arrays alongside these new network-based storage controllers. So instead of the problem getting easier, it just got more complex.
Getting a bigger brain in the network appears to be a foregone conclusion as storage networking moves ahead. By providing faster provisioning, secure access, improved QoS and simpler management through a central console, a new intelligent fabric is slowly emerging. However, until the brain matures and more questions get answered, much of this technology still belongs predominantly on the test floor and not on the production floor.