Storage area networks (SANs) have the ability to save a company money, hasten backups and help consolidate the data center. But, a SAN is not to be left alone. It's a complex, growing, breathing entity that constantly needs to be tweaked and upgraded.
So what is SAN School? If this is the first time you are asking yourself, "What's a SAN?" -- SAN School is for you. If you are implementing your first SAN and need implementation and migration help -- SAN School is for you. If you are far along in the SAN process and need to extend your SANs or connect SAN islands -- SAN School is for you.
The authors of "Storage Area Networks for dummies", Christopher Poelker and Alex Nikitin, are your SAN School professors. Through 20-minute Webcast lessons, they covered SANs from A to Z, from what a SAN is, to connecting those last nodes for optimal performance.SearchStorage.com readers sent Chris and Alex received a lot of questions during each SAN School Webcast lesson. Since each lesson only lasted 15 minutes, they weren't able to answer all of your questions. Thanksfully, though, Chris Poelker was kind enough to answer them after each lesson.
The editors of SearchStorage.com have taken each of your questions and Chris's answers and posted all of them for you here. Also, if you missed any of teh SAN School Webcasts, click
If you would like to view any of the SAN School Webcasts, click here
Here you'll find answers to questions pertaining to:** Block-based data and file-based data
** Determining when a SAN makes sense economically
** Differences between LUNs and volumes
** Parity and how it works
** RAID, specifically levels 3 and 4
** The difference between FC-AL and FC-SW in switches
** 1st mirror disks and the strips on top of them
** Performance of SANs when used to enable a clustered database
** Different methods of failover, log shipping vs. replication
** How an SATA drive hooks up to a SAN
** Explanations of port zoning, aliases and name servers
** The difference between a Volume and Logical Unit Number (LUN)
** Vendor recommendations for HBAs, cables, GBICs, and switches
** LUN Zoning
** What happens when SAS and SATA HDs hit the market
** Masking and what it is
** Why port zoning is more secure than WWN zoning
** How to set a core switch to be a principle switch?
** The difference beween a fabric switch and a loop switch
** How FC-AL is implemented in a loop switch
** Write order fidelity
** How to setup persistent binding on an HBA for a Unix HOST?
** LAN free and Serverless backups
** LAN free backup and datapaths from disk to tape
** Best practices in building a network for iFCP
** Routing IP over a Fibre Channel network
** Why use IFCP?
** Creating SWAN's or greenfield iSCSI SANs
Question: What is the difference between block-based data and file-based data?
Professor Poelker: When people in the industry make reference to "block based data", what we are really talking about is the access mechanisms used by those applications that get access to their "data" using the SCSI (small computer system interface)protocol, which is a protocol used by operating systems (your computer)to "talk" to disk drives (where the data is stored). The SCSI protocol transmits data in "blocks", rather than something like the IP protocol, which transmits data in "packets." So block based data is the data stored on disk drives, in its native format.
File based data is also stored on disk drives, but access to "file based data" is normally done over an Ethernet network, rather than a storage network. The term NAS, or network attached storage, uses NFS of CIFS (file access protocols) over the TCP/IP protocol (underlying network transmission protocol)to "talk" to individual "files" stored on disk drives. So file based data are the actual individual files (like a word document, or a home directory) stored on the drives.
There is much more overhead associated with access to individual files, rather than raw block transfer of data to disks. This is why high performance databases almost always access data using "block" access, rather than "file" access. You can simplify this by always associating "file" access with NAS devices, and "block" access with SAN devices.
Question: When deciding whether or not a SAN makes sense, is there a threshold size of storage needed before a SAN makes economic sense (over 100 GB, 1TB, etc.)? Or, do the other factors totally outweigh that question?
Professor Poelker: There is no set rule... all the factors need to be combined to see if it makes sense from a financial perspective. If all the benefits of what SAN has to offer outweigh the initial capital outlay, and it will pay for itself in a reasonable period of time, then go for it. The larger the organization, the larger the benefits will usually be. If you're a small shop and have just a few servers to manage, and backup is not a problem, then you probably do not need a SAN.
If you find backup is becoming a burden, or you need to share disks because you are implementing server clustering, or it's a nightmare to manage the data on all your internal storage, then a SAN would make sense.
Question: Why not use a shared file system in a SAN versus NAS for servers that need file sharing?
Professor Poelker: Actually, that's where the industry is heading. A GFS (Global File System) is currently under development by Microsoft for the Windows platform (via Longhorn). There are already multiple solutions on the market from other OS vendors (Sun's SAMfs, IBM's, storage tank), and a few for Linux. You can buy these solutions today, but they are proprietary. When all the OS vendors finally get together and agree on a standard for storage based Global file systems, it will become more prevalent and much cheaper.
Question: How does parity work? Is it data or information about the data? How can data be "recovered" from parity?
Professor Poelker: Parity works by using a mathematical formula (usually an exclusive OR operation) on the data before it is stored on the disks. RAID (redundant array of independent disks) works by storing the data in chunks across all the drives in the "raid set." The parity information (the result of the mathematical formula applied to the data) is stored separately from the data itself (usually striped across all the drives in the raid set). If one of the drives goes bad, and you loose the "chunks" of real data stored on that drive, then the RAID controller re-creates the original information by using the formula in reverse, to calculate the original data that was written. You could say that parity is data about the data, or "metadata", but that term does not usually apply to parity.
Question: What is the difference between LUNs and Volumes?Professor Poelker: A LUN is a "logical unit number," and is usually associated with the physical partition used by a host when writing data to disks. LUN numbers can be associated with SCSI ID numbers. Basically it is the address of the disk so the host can find it. A "volume" is usually associated with a file system that is written across multiple LUNS. Let's say you have two LUNS (disks) attached to a server connected to a SAN. The server has the capability of combining those LUNS into one "volume", so it can lay down larger file systems. Software like Veritas Volume Manager is one example. Volume manager can group together multiple LUNS into larger "volumes" so massive amounts of data can be stored on a single file system, rather than on multiple file systems on multiple LUNS.
Question: Great Lesson! I have also heard of RAID levels 3 and 4. Could you explain these?
Professor Poelker: Under RAID levels 3 and 4, the parity data is stored on a single dedicated disk, rather than being "rotated" across all the disks in the raid set. If your application needs to access large blocks of sequentially addressed data, then RAID 3 or 4 may be a better method.
Question: If the SAN is being used to enable a clustered database between a fixed small number of nodes (say <6), what is the best cost/performance solution?
Professor Poelker: The best solution for this environment would be a "hub" based SAN using the FC-AL (Fibre Channel arbitrated loop) protocol. A generic SCSI cluster would make sense for a two node cluster. Over two nodes, SCSI gets cumbersome. FC-AL components have become quite cheap, and a simple FC-AL based Fibre Channel shelf with disks installed will be much cheaper than a raid array.Question: A loop switch is said to be a non-blocking device, enabling point-to-point communication between node. How is FC-AL implemented in switches? What is the difference between FC-AL and FC-SW in switches?
Professor Poelker: In Brocade switches, there is a technology used called "Quickloop" that enables a bridge between FC-AL devices on the port that is considered a quickloop, and the rest of the devices in the fabric. Fabric based devices can reach FC-AL destination addresses on the loop, since the switch associates a fabric address to the FC-AL addresses within the loop.
All switches use FC-SW as the native protocol. The ability to connect to legacy FC-AL devices through a switch that supports FC-SW to FC-AL address translation allows you to re-use older hub (FC-AL) based devices like tape drives.
Question: What is the name of the 1st mirror disks and strips on top of them (it gives more felexibility at the disk failure time)?
What you are suggesting is called RAID 1+0, or RAID10 (raid ten) for short. It is always better to mirror disks first, and then do striping, rather than create a mirrored stripe set. If you stripe first, and then mirror, and you lose a drive, the entire stripe set is disabled. If you mirror two disks at a time, and then stripe, you can survive multiple disk failures.
Question: How difficult is it to move from a modular array to an enterprise array as needs require you to make the transition?
The latest versions of storage management solutions are moving ahead with a way to classify data via policy, and then use a method of HSM (hierarchical storage management), which uses the software itself, or a function of the hardware to automatically migrate data between platform classes based on the created policy. This allows you to create an SLA (service level agreement) on specific data types or applications, add the application to the policy, and the data ends up on the correct storage based on the SLA. In the meantime, you can always use the host OS to create mirrors of your volumes (one member on the modular, and the other on the monolithic) to migrate data between platforms. Your storage vendor can also help you do this via a services agreement when you want to make the move.
Question: Does every frame get converted from digital to optical through the switch?
Professor Poelker: Yes, that is the function of the GBIC (gigabit interface converter) on every switch port. The light pulses coming into the switch are converted to digital data, the switch looks into each frame to find the destination address, routes the data to the correct switch port that is attached to the target, and the GBIC on the target port converts the signal back into light pulses for re-transmission to the target. This all happens extremely fast. Within the next ten years, I expect to see optical switches on the market that will eliminate that requirement. Optical or "photonic" switches will be able to move terabytes of data per second.
Question: In a SAN environment, what method of failover is most recommended, log shipping vs. replication?
Professor Poelker: In a SAN, the term "failover" is usually used to indicate a dual attached host's access port failover to another HBA. Host port failover is accomplished with a "filter driver" located either in the OS (Like MPXIO in Solaris, or MPIO in Windows), or via a driver provided by the storage vendor (Like Powerpath from EMC, Securepath from Compaq, or HDLM from Hitachi).
In the case of your question, I take the term "failover" to mean application level failover to another location, which would imply a form of data replication. There is no basic recommended solution, since the solution chosen would require an assessment of your environment. If you have the bandwidth between sites, and your distance between sites is short enough to keep latency low, then SYNC replication would be the way to go. If there is a great distance between sites, then ASYNC replication would be better. If you have limited bandwidth, then log shipping would be the next method I would choose.
Question: How can an SATA drive hook up to a SAN?Professor Poelker: Storage array manufacturers are building arrays that can house either SCSI drives, Fibre Channel drives, ATA drives, or SATA drives within the array. The controllers front ending the connection between the drives and the SAN fabric bridge the protocols to allow connectivity.
Question: You made a comment that port zoning doesn't use the Name Server. It does. What did you mean by this? You comment that the name server will complain when merging a fabric if you use non-unique alias names. Aliases are not stored in the name server. What does the comment mean?
Professor Poelker: My comment was geared toward the differences between "HARD" zoning and "SOFT" zoning. By saying port zoning does not use the name server, I meant to indicate that port zoning is enforced via the hardware, and that when using hard or "port" zoning, frames not destined to the zoned ports are barred by the hardware from those ports. Soft zoning, or "WWN" zoning uses only software (the name server) to enforce the zones (this is changing with newer switches) and "frames are not barred from being transmitted between nodes that are not in the same zone" (quoted from building SANS with Brocade, by Chris Beauchamp).
The alias server is a type of "name server" that uses extended link service requests (Alias_ID) to refer to multiple N_ports through a single name. My comment was to warn users who attempt to integrate separate fabrics with zoning in place that they will run into difficulty connecting the fabrics together if they have used the same alias names for anything.
Question: What is the difference between a Volume and Logical Unit Number (LUN)?
Professor Poelker: A volume normally refers to a "disk" created via a "volume manager" such as Veritas, or a volume created by an operating system, such as Windows NT. A LUN refers to a "logical unit number" presented to a host as a SCSI ID. (i.e., LUN number 1 specifies SCSI ID 1 on that port. Therefore, the term volume can be considered software based, and LUN considered hardware based).Question: Which vendor products do you recommend for HBAs, cables, GBICs, and switches?
Professor Poelker: I do not make specific recommendations, since I try to always be vendor agnostic.
Question: You discuss zoning methods. How does LUN zoning come into play with the methods you discuss?
Professor Poelker: LUN zoning comes into play as a second level of security on top of fabric based zoning. LUN zoning is normally done at the storage array level, by admitting only specific fabric WWNs to access the LUN within the array. This is also known as "LUN masking". LUN masking, used in conjunction with fabric based zoning, provides the best security for your SAN.
Question: What do you think will happen when SAS and SATA HD's hit the market - which one WINS? Where and Why?
ATA drives are already being shipped within storage arrays as a low cost alternative for applications like data archiving and regulatory compliance. SATA drives are also now appearing. SATA drives have a faster interface than normal ATA disks, and can have higher duty cycles. SATA will be used where more performance than ATA disks is needed, but cost is still an issue. You will see both types as a replacement for tape within virtual tape arrays used for disk class backup, and as an archive repository.
Question: What is Masking? Is it sort of the reverse of zoning, where you say a port or WWN cannot talk to another specified WWN or port?
Professor Poelker: Masking is actually "LUN masking", and is used to provide security for LUN access at the storage array level. It is not the reverse of zoning. It is used in conjunction with zoning to provide two levels of security in the SAN.
Question: On the port zoning and mixed zoning slides, what does the line connecting Port 0 on Switch 2 and Port 4 on Switch 1 represent?
Professor Poelker: It is not labeled, and does not appear on the WWN Zoning slide. That was not a line it was just part of the larger circle that represents zone 2. If you look at the graphic closer, you will see that the only link between the two switches is the ISL link between ports 1 and 5.
Question: Chris said that of the two types of zoning, WWN and port, that port zoning was more secure. I'd like to know why he thinks port zoning is more secure than WWN zoning.
Professor Poelker: I learned most of the things I know about zoning by playing with the older Brocade 2800 switches. Newer switches offer better security when zoning using WWN or physical ports. On the Brocade 2800, WWN zoning is accomplished in software, and port zoning is done at the hardware ASIC level. When using hard or "port" zoning, frames not destined to the zoned ports are barred by the hardware from those ports. Soft zoning, or "WWN" zoning uses only software (the name server) to enforce the zones (this is changing with newer switches) and frames are not barred from being transmitted between nodes that are not in the same zone.
Question: How do you set a core switch to be a principle switch?
Professor Poelker: It's a simple check box in the GUI of the various switches from different vendors. Just log into the switch, and set the parameter to on.
Question: What's the basic difference beween a fabric switch and a loop switch?
Professor Poelker: A "loop switch" can also handle the FC-AL protocol, and allows the attachment of older FC-AL based SAN gear (like tape drives) to the fabric. The switch assigns a unique WWN for each FC-AL ID on the connection, so that fabric based devices can address the FC-AL based device. Brocade was one of the first to provide this functionality through their "quickloop" protocol in the original 2800 switches.
Question: From what I could draw from available sources is that loop switch as well as fabric one has point to point, non-blocking architecture. How is FC-AL implemented in a loop switch?
The switch provides a translation mechanism in a daemon running within the switch firmware. When an FC-AL device is attached to the switch port, the switch will assign a WWN to the FC-AL device based on an offset of the switch ports WWN. This way, each FC-AL ID also gets a WWN.
Question: What is "write order fidelity"?
Professor Poelker: Write order fidelity is a term that describes the ability of data replication hardware or software to move data between geographic locations, while keeping consistency for transaction oriented applications. In other words, it's the ability to keep data "in the same sequence" as it was written at the originating source. This is extremely important for database consistency. If data is written in an incorrect sequence at different sites, the transaction logs will no longer be consistent, and databases can become corrupt at the remote site. The operations are kept in proper sequence by assigning either a timestamp or sequence number to each write at the source site, the data is then transmitted to the remote site over multiple links for load balancing, or asynchronously for longer distances, then reordered to the proper time sequence before written to the logs at the remote site.
Question: How can I set up a persistent binding on an HBA for a Unix HOST?
Professor Poelker: That is a function of the driver for the specific HBA and the SD.conf file. Use the GUI that came with the driver install software to set persistent bindings. For JNI adapters, the utility is EZFibre. Use the target to world wide name binding feature for the driver to set the wwn in the /kernel/drv/sd.conf file. You can set the proper bindings in the fcaw.conf file for JNI adapters like this:
# Configuration flag def_wwn_binding
# Type: string; default: "$xxxxxxxxxxxxxxxx" (means WWN is "static don't care")
# Sets the 16 digit hexidecimal default wwn binding for every target/lun
# instance which does not explicitly define one.
# - A "$" preceding the string indicates static binding enabled
# - A "x" in place of a digit indicates "don't care" for that digit
# *See technote for details on wwn bindings
def_wwn_binding = "xxxxxxxxxxxxxxxx";
For the SD.conf file, follow this format:
name="sd" class="scsi" target=1 lun=0; wwn="2000004568018769";
name="sd" class="scsi" target=1 lun=1;
name="sd" class="scsi" target=1 lun=2;
Question: What is the difference between LAN Free and Serverless backups?
Professor Poelker: LAN free backup usually involves the ability to share a SAN connected tape library between all the nodes connected in the SAN. The backup server simply co-ordinates access to the tape resources. Each server in the SAN actually runs a copy of the backup engine, and moves it's own data to tape. This is sometimes called the "SSO" or shared storage option from some backup vendors. The backup server becomes the traffic cop for the SAN connected tape resources, and allows each server in the SAN to back up it's own data. This removes the need to "PULL" data over the LAN via backup agents to a backup server connected tape resource.
Serverless backup is accomplished by the backup server having the ability to connect to storage on behalf of other hosts connected to the SAN, and back up that hosts storage on it's behalf. This usually involves the use of snapshot or image copies of the production LUNS in the SAN. The snapshot is used as the source media for backup, so that the production application can continue during backup. The snapshot is given access through LUN security in the SAN for access by the backup server, and the backup server sends the data to tape. Another method is to use the SCSI extended copy command called E-Copy, which allows even the backup server to get out of the backup path. E-copy allows data to move directly from disk to tape via a "data router", which provides the E-copy intelligence.
Question: Is Serverless backup the only solution for backup on a SAN?
Professor Poelker: Not at all. All the traditional methods for backing up data are still available for SAN connected servers. Using the SAN as the data path rather than traditional LAN based backup is what makes SAN based backup a better solution. You will find the fibre channel protocol provides a faster lower overhead/latency data path (up to 200MB per second) data path for backup streams. This is due to the fact that Fibre channel transmits data in larger SCSI blocks, rather than needing to be formatted into IP packets.
Question: In LAN free backups, could you explain the datapath from disk to tape?
Professor Poelker: During LAN free backup, the data path to tape is through each server backing up it's own data to tape, over the SAN. Data is read from the servers LUNS into server memory over one SAN HBA connection, then sent out to the tape library through (hopefully) another dedicated HBA for tape backup through the SAN.
Question: What is the "best practice" in building a network for iFCP? VLAN 100Mbps, VLAN 1Gbps? Is latency an issue?
Professor Poelker: The best practice is to provide the iFCP WAN connection with the correct amount of bandwidth to handle PEAK loads across the connection. The idea is to have a balance in WAN network costs and data throughput requirements. If you have a good understanding of timeframes for peak workloads, and If your WAN vendor provides the ability to lease bandwidth on demand, you should be able to maximize the utilization of your links vs the monthly costs for those links.
Let's say you know that the last Tuesday of every month is used for month end batch processing, and your bandwidth requirements increase during those periods. If a T1 connection handles the normal daily traffic across the link, but the month end processing requires a T3 link, your WAN vendor may be able to charge you for the bandwidth you actually use for the entire month, providing T1 bandwidth normally, and then increasing that bandwidth to T3 speeds during month end. Check with your WAN vendor to see if they can provide this "sliding scale" access to the links for your lease. Hey, if you have the budget, then Gig-E is always a great thing to have!
Question: I believe that the technology to route IP over a Fibre Channel network was mentioned. Can you provide any more details on this technology? Are there specific vendors who do this?
Professor Poelker: Sure, this works like Microsoft's redirector layer (NDIS?) You can assign multiple protocols to the HBA connection. Both IP and FC traffic can then pass over the links. JNI and Emulex provide this functionality, and most of the SAN switch vendors support IP traffic and IP routing over the fabric. This is a great way to provide MSCS cluster heartbeat connections over the same link as the data traffic for wide area clustering. I know the 5-2.21a8 version of the Emulex port driver for Windows 2000 supports this. Go to http://www.emulex.com/ts/docfc/frame92l.htm for more info.
Question: If ISCSI can map FC to IP, why should I use IFCP?
Professor Poelker: iSCSI is used for block access to disk resources for HOSTS over IP for servers that do not have fibre channel HBA connections to a SAN. This requires an iSCSI driver for each server for the LAN NIC, and either native iSCSI storage arrays, or an iSCSI bridge from that converts iSCSI traffic to FC traffic for traditional FC based SAN storage arrays.
IFCP is a protocol used between SAN islands to create a SWAN (storage wide area network). Both iSCSI and iFCP use IP as the transport mechanism, but they are used for different applications. IFCP is used for connecting fabrics together, and iSCSI is used for IP hosts to access disks.
Question: Will most iSCSI deployments, in your opinion, be able to create SWAN's or greenfield iSCSI SANs?
Professor Poelker: iSCSI devices can be used in place of fabric switches, so that the core of the SAN fabric can be created using traditional GIG-E switches, which can not only save money, but a company to leverage the existing technical expertise in data network environments. SWANS are not created via iSCSI, they are created by two other protocols, being FCIP and iFCP. FCIP and iFCP are used to connect SAN based switched fabrics together over IP links.
About Christopher Poelker:
Aside from being an author and a SearchStorage.com SAN expert Christopher Poelker is a storage architect at Hitachi Data Systems. Prior to Hitachi, Chris was a lead storage architect/senior systems architect for Compaq Computer, Inc., in New York. While at Compaq, Chris built the sales/service engagement model for Compaq StorageWorks, and trained most of the company's VAR's, Channel's and Compaq ES/PS contacts on StorageWorks. Chris' certifications include: MCSE, MCT (Microsoft Trainer), MASE (Compaq Master ASE Storage Architect), and A+ certified (PC Technician).