Published: 12 Mar 2004
|Saving dollars with distributed file systems|
A distributed file system delivers the benefits of network-attached storage (NAS)-like data sharing with the scalability of a storage area network (SAN). What's more, a distributed file system can eliminate the cost of spare systems. Systems participating in the distributed file system share access to a common data store, so if a server running an application fails, other participating servers can start the application and resume the work. Some examples of distributed file systems include the IBM's SANFS, Silicon Graphics' CXFS, Sistina's (now Red Hat's) GFS and Veritas' Clustered File System. (See "A simple design for a distributed file system").
It's possible with a distributed file system to build server farms consisting of inexpensive PC servers where each server has a single SAN connection and redundancy is provided through the farm. If a server fails or loses its SAN connection, it can be removed from the farm and file system; the client application can reconnect to another server. In general, it doesn't matter whether or not an individual server is available, as there should be adequate CPU power in the farm to handle peak work loads and the occasional loss of a server.
Of course, it's essential to plan for the unexpected loss of a switch. Again, the use of eight-port switches minimizes the exposure to a single switch failure. If a production switch fails, the use of a spare switch provides a modular solution bringing servers back online in relatively short order. On the performance side, the immediate performance exposure to a switch loss in a server farm can be calculated as the percentage of servers accessing storage through an individual switch.
For example, if there are 48 servers in a Web server farm accessing storage through eight switches (six servers connected per switch), then the loss of a single switch would result in a drop of 12.5% of the server farm's computing capacity. Here are the numbers: Forty-eight servers save $1,500 each by using a single connection for a total savings of $72,000; there's a spare switch costing $6,000, but no spare servers. The overall SAN savings is $66,000, not counting the cost of implementing the server farm and distributed file system.
It's time to reevaluate the high-availability and high-cost assumptions about SANs and look at some alternative architectures and topologies that would allow Windows and Linux PC systems to be incorporated into SANs at a much lower cost.
There's little doubt that many of the benefits of SANs--high-availability, mission-critical storage services with centralized control and management--would be useful for DAS-connected Windows systems. But with a Windows or Linux server costing less than $5,000, it's hard to imagine budgeting in the range of $75,000 or greater for an entry-level SAN for a group of four or five servers. Storage is certainly important, but its cost can't be multiples of the server cost.
|A simple design for a
distributed file system
Rethinking SAN prices
SANs really shine when matched with high-end systems and their requirements for full redundancy and superior performance for high throughput transaction processing. On the other hand, the reliability of a single Windows or Linux server is usually not critical because servers can be deployed in farms. The newest directions in server computing, blade servers and grid computing revolve around the "any-node-will-do" philosophy of using available CPU resources. The obvious question is: If servers don't require ultimate reliability and redundancy, why should storage?
Servers are purchased according to the computing power needed to meet the needs of the application. Companies don't buy a large system if a smaller system will do the job. So, an equally obvious question is: Why use first-tier storage for low-end servers that are hosting applications that don't need expensive performance capabilities?
A common response to these cost-comparison questions is the refrain that a SAN needs to be able to handle a complete range of performance requirements. That kind of thinking begs the question as to why every component in a SAN needs to meet the requirements of the highest performing servers and applications.
SANs provide key capabilities for mission-critical applications, but because of their cost, they tend to be underutilized for applications that aren't mission-critical; nearly 75% of all server-based data is stored on PC server platforms on DAS storage. So, the challenge is to find ways to expand the use of SANs through less-expensive technologies and by operating/management efficiencies and practices.
The $64,000 question is: How do you cut costs out of a SAN without severely diminishing its benefits? One obvious way is to spend less on SAN products. That approach works, but it depends entirely on the availability of less-costly products. There's a lot of potential to reduce cost by using inexpensive technologies where appropriate, such as ATA, SATA and iSCSI. However, we won't address replacement technologies here; instead, we'll look at SAN architectures and topologies in an attempt to find more cost-effective designs. These designs reject the basic SAN assumptions about the need for high availability and connections for redundancy. In sum, there are ways to make significant spending cuts that don't seriously impact application reliability and performance.
The single-connection SAN concept is made up of the single-connection servers, eight-port edge switches with dual interswitch link connections, two spare servers, one spare switch and centralized storage in the corporation SAN.
Remove redundant connections
First, it's necessary to question whether dual SAN connections for high availability are needed on every server. Of course, many storage administrators will feel that there's no point in installing a SAN that doesn't use dual-pathing for every node, but dual-pathing effectively doubles the capital outlay by requiring duplicate host bus adapters (HBAs) and switch ports for every server.
Using a rough estimate of $1,500 per connection to the SAN, dual connections cost $3,000 per server. When you multiply the connection cost by 10 servers, it becomes a $30,000 budget item, and when you multiply it by 100 servers it amounts to $300,000. This is far from chump change.
Why even bother with the extra connection? For most servers--especially PC servers--there's more than enough bandwidth for the job in a single connection. Besides, in dual-connection configurations on PCs, the second connection will probably only be working in standby mode. In other words, there's no performance requirement for a second connection, and it doesn't get used anyway. Additionally, it doesn't matter if the connection speed is 1Gb or 10Gb--a single PC server won't come close to using all of the available bandwidth.
It may seem sacrilegious to suggest dual-connections for redundancy aren't needed, but it's important to keep the application requirements at the forefront. There are many applications that don't need instantaneous failure recovery to service clients. For instance, most Internet-based applications are designed to assume the client/server connection can be lost, forcing a reconnect.
A single-connection SAN is very reliable. Once a SAN is running, it tends to keep running. Fiber optic cables are practically indestructible except for internal tampering or an industrial accident, and it's not guaranteed that dual connections would help that much in either of those cases. HBAs and switches also tend to have excellent reliability characteristics. However, one of the components that's suspect in a single-connection SAN environment is the gigabit interface converter (GBIC). GBIC failure rates are similar to those of disk drives; in other words, you can expect them to start failing after five years of service.
Using a single-connection SAN also means there are half as many connections to manage (see "Single-connection SAN"). To the uninitiated, this might not seem like such a big deal, but it's significant to administrators who actually do the work. The fewer connections there are to check when making changes, the easier and faster it is to make them. And a single-connection SAN further reduces the number of connections per switch (by using small switches).
|Blade servers can lower storage costs|
Blade servers are becoming popular as a way to centralize server-computer resources and share common components. In that sense, blade servers are similar to disk subsystems, where the devices all benefit from common power and packaging.
Obviously, the discussion about saving money on storage area network (SAN) costs and eliminating unnecessary ports for server farms leads to questions regarding whether or not blades can also share storage and SAN connections, further reducing costs. As with all computer configuration topics, the answer is: "it depends."
For starters, it's advisable to take boot drives off blade servers to make the blades as reliable as possible. There's no need to add the cost of mirrored disk drives to a server blade. That means the blades should be able to use network boot technology such as that provided by Intel's PXE. If network boot can be used, then all the blades can share a common boot image, which could be a set of high-availability mirrored disk drives or a memory/flash memory disk. Network access to the boot image is made through the network connections integrated in the backplane connections of the blade server.
Similarly, to reduce costs further and increase reliability, SAN connections can also be integrated into a blade server's backplane, and integrated with an internal SAN switch. For example, IBM blade servers include an integrated 16-port Fibre Channel switch that communicates to optional host bus adapter (HBA) modules in the blade server cards. It's possible to connect to an external switch or director using dual connections for reliability from the embedded switch.
If the blade server package is done well, the management of all this can be straightforward, including the setup of boot configurations and switch fencing (zoning or virtual SANs). Using blade servers doesn't eliminate the need for HBAs and switches, so the number of ports isn't really decreased beyond a single connection SAN, but the management and ease of integration is likely to pay for itself many times over in the life cycle of the blade server.
Shrinking SAN costs
With PC server hardware costing less than $5,000, it's hard to justify a 40% to 50% SAN tax for dual connections. Instead, it makes much more sense to consider using inexpensive spare systems. Using an N+1 approach where a single spare server provides redundancy for 20 or so production systems, it's possible to have fairly inexpensive data redundancy. If you lose a server or a SAN connection, the spare can step in and do the work. There's no immediate and automatic failover, but these servers aren't usually shouldering critical applications.
If you ask several people what the cost of a SAN connection is, you'll get many different answers. There are list prices, street prices and even eBay prices. For our calculations, a cost of $1,500 per connection is used. That figure was chosen because it's a conservative number that doesn't exaggerate the cost of a SAN connection too much. The use of small, inexpensive Fibre Channel (FC) switches with port prices of approximately $500 per port and an HBA price of $1,000 was assumed.
Let's crunch the numbers: If you save $1,500 per server by not using redundant connections on 20 servers, that amounts to $30,000. Then, if you install a spare server that costs $6,500 ($5,000 + $1,500 for HBA and switch ports), the amount of money saved is $23,500, compared to outfitting 20 dual-connected systems.
With single connections, a SAN still provides flexible access to data resources, superior scalability and centralized management control. Not only that, but backing up these systems over a SAN is a thousand times easier than backing them up over the LAN. There's no reason why single-connection SANs can't be as much a part of the SAN infrastructure as a large director-class implementation. Single-connection SANs are just targeted to a different set of requirements.
The main problem to watch for in a single-connection SAN is a switch failure. Obviously, if a switch fails none of the systems connected through it will be able to access their storage. So, the goal of the topology for the single-connection SAN is to reduce the overhead needed to accommodate the loss of a single switch. In other words, use eight-port switches. With eight-port switches, you'll have more switches to manage, but the number of connections to manage per switch is limited.
For example, assume an eight-port switch has six systems connected to it with two ports left over to connect to storage subsystems or other switches. If a switch fails, it will be necessary to reconnect the servers from the failed switch to any six available ports. In other words, you need to reserve six ports that you can use at a moment's notice. One way to guarantee there are ports available without running into other access control problems is to have a spare eight-port switch ready to take the place of a failed switch. Keep in mind that this isn't necessarily a matter of reconnecting cables and it is advised that you keep configuration and zoning information available for all production switches. Fortunately, an eight-port switch is much easier to configure than a 32-, 64- or 128-port switch.
Cranking through the dollar wheel again, let's assume this time that there are 40 production servers, two spare servers and a spare switch. Saving $1,500 per server on 40 systems by avoiding dual connections equals $60,000. To offset that, there are two spare servers at $6,500 each and a spare eight-port switch at $6,000 (calculated here at eight multiplied by $750) for an alternative redundancy cost of $19,000, resulting in a total SAN cost reduction of $41,000. There are many ways to spin these numbers, but the key variables to consider are the number of spare systems and the cost of the spare switch. Users with older 16-port switches that no longer have a place in their primary SAN could redeploy them at very low costs.
This is an environment where a core-edge topology makes a lot of sense. Connecting the eight-port switches to existing corporate SAN switches allows PC servers to use storage resources on existing storage subsystems, where they can be centrally managed. While systems connect to switches over a single-SAN connection, there are dual connections for interswitch links (ISLs), or links connecting to storage. This provides redundant protection on the paths carrying data for all systems connected to the switch. The performance of this design should be more than adequate; in fact, there's an abundance of bandwidth for server connections.
However, the single-connection SAN doesn't have to be connected to another SAN to be effective. Medium-sized businesses without SANs that can't afford a fully redundant SAN could build a single-connection SAN for much less money. With no existing SAN storage subsystem, the design would need to include a way to connect to storage. This could either be done through one or two additional switches functioning as backbone switches or by connecting the eight-port switches directly to a multiported storage subsystem. Additional switches certainly skew the cost calculations here, but they also provide room for expansion, including such niceties as connecting to centralized tape backup equipment.
Of course, there are many more things to look at to make network storage more affordable than the price of the technology. But by keeping a realistic focus on the availability needs of the application and by leveraging system redundancy techniques, it's possible to significantly cut the cost of SAN components--extending the benefits of the SAN to many more systems.