Database and storage administrators have carefully selected RAID configurations-for example, using RAID 0+1-both mirroring and striping-in order to gain the greatest read performance from their hardware without losing redundancy. While RAID 5 provides for redundancy at a somewhat lower cost, most database vendors today recommend RAID 0+1 for tables as well as logs and only recommend RAID 5 for less performance-critical backup volumes. Caribou Lake's Leo says, "Performance is highly dependent upon how you configure the SAN. We have seen very good performance for databases with RAID 1+0 type configurations. We have also seen decent performance for RAID 5, unless the RAID 5 is in rebuild. If the RAID 5 is in rebuild, write performance is so bad, the system may as well be down."
To further optimize access times and data throughput, database administrators have to also make sure that tables and logs never share the same spindles on a disk. John Eisenschmidt, a DBA at the American Association for the Advancement of Science, Washington, D.C., says for his last install of Oracle, "We made sure not to put the index tablespaces with the data tablespaces to improve read performance, and put the archive logs and redo logs in different places to balance write operations." Eisenschmidt manages about 2.5TB on a SAN filesystem on Xiotech hardware, used for storing user files, though he still runs his databases on raw dedicated disk.
However, with the advent of SANs and virtualized storage, administrators no longer have that level of control of data placement. According to Bob Rogers, chief technology officer of BMC Software, "Most DBAs are absolutely convinced that file placement is an absolutely critical issue, however, most disks today are virtualized-and users don't have a clue where things are physically located."
Luckily, the speed and manageability of storage have improved the situation, despite the loss in control. Rogers says, "Today, you can get a much more consistent level of performance by going to very large arrays such as those from EMC, IBM, and HDS than you could going to direct-attach disk."
According to Hoke, a good database administrator will try to spread those tables across multiple devices, so he doesn't create hotspots, and also for availability. However, with virtualization, those two tables may be inadvertently placed on the same disk, because virtualization masks physical locations. "Virtualization is great for provisioning storage because it's easy to move data and add disks. But with a database, because of performance issues, there are potential drawbacks," Hoke says.
CreekPath's Booth agrees: "File placement is still absolutely important with a SAN, although disks have gotten faster. It still comes down to laying down data on a spinning disk, and not putting redo logs and indexes on the same physical drives. That's why it's important for tools to see all the way down to the spindle level, so that when they're laying down data it's not stepping on its own toes."
Marty Ward, director of product marketing at Veritas concurs, although he doesn't think administrators need to be directly concerned with where data is, "DBAs don't need to be considered about which spindle disk data lies on. We take care of that for them. Our logical volumes allow them to move data around on-the- fly to adjust hotspots, even while running." Other vendors are also announcing tools in this area.
Durnford sees things the same way. "Using JBOD, you're much more sensitive to placements [hotspots]. However, using storage arrays, we use their technology to move storage around while running to eliminate hotspots, so that the server doesn't really need to know the physical storage location," he says. Durnford uses built-in tools on his Hitachi, IBM, and EMC arrays to deal with hotspots on his volumes.
SAN issues
"In the old days of NFS [network file systems], due to lack of availability capabilities for Oracle, our customers saw storage corruption," says Oracle's Hoke. Because of those issues, Oracle's validation program for network-attached storage arose out of concern about the nature of NFS," he says.
In a SAN, storage just looks like a SCSI disk-reducing a number of the headaches that came with putting databases on network-attached storage, since the interface is identical to that of SCSI. Unlike NAS - with storage networks Oracle has left it to their storage partners to qualify HBAs - switches and front-end storage FC adapters. Hoke says that with a SAN, "We have not identified anything needing validation."
However, it still takes some work to get SANs performing right for databases. DBA Eisenschmidt - despite running a SAN for his applications and file systems -is skeptical of using SANs for his database storage. "Our SAN, like many I've seen, is optimized for storage capacity and file serving. While Oracle uses Xiotech's Magnitude in their test labs, I was pretty unhappy with the transaction performance we were able to get out of it."
CreekPath's Booth says, "The biggest issue with databases on a SAN is maintaining consistent throughput and access on a database while running other application sharing the same SAN."
Caribou Lake's Leo agrees, saying, "SANs often save an enterprise money by sharing I/O capacity across departments. But sharing I/O resources between the user files of one department and the DBMS storage for another is awful. The database management system's I/O characteristics are completely different than the user data files, and cache coherency becomes an issue."
Royal Bank of Canada's Durnford has opted for separate storage networks to solve some of these problems. He says, "For tape, we're looking at a separate tape SAN. The issue there is database traffic is very time-sensitive, where tape traffic is very backup-oriented. Though you might think it's time shifted, we've found there's overlap."
The other issue most DBAs mention is the complexity of managing storage networks, which are typically managed by separate storage infrastructure staff. "A lot of progress has been made in the last couple years, but most system administrators want SANs that are easier to administrate. I want a more reliable SAN whose caching I'm not afraid to turn on," Eisenshmidt says.
Leo says that when you put a database and storage together, "You realize that your SAN management team must work closely with the DBMS team. We have seen situations where SAN managers have decided to change the SAN configuration because it seems wasteful or odd, not realizing how carefully tuned the setup was."
CreekPath's Booth also agrees, saying that while careful tuning of how storage is configured can eliminate sharing issues, "Both DBAs and storage administrators need to be on the same page."
Despite the issues, some users have had great success running SANs with their databases. John Brooks, vice president of storage engineering at Cleveland, OH-based National City Bank, runs an 18TB SAN, which is based on Brocade switches with IBM and Hitachi arrays. "We have no issues with running on a SAN. Not even performance issues. It just seems to run," he says. Asked if he does anything special for databases, he says "For SQL and Oracle databases, we throw it on the SAN." Brooks attributes the lack of issues to the relationship he has with his storage vendors. He says, "We do a lot of research, primarily in the interoperability area that might account for the lack of issues. We work very closely with our storage vendors to ensure certification and supportability of the environment, in case issues do arise."
Tools and techniques
Because SANs' new role as the primary storage for databases, there's now a need for improved tools and software to manage the combination of database applications and storage. "It would be nice if vendors could provide some sort of management tool to track why a SAN partition was configured in a certain way, thus preventing the loss of historically significant details when SAN management changes hands," says DBA Leo.
"If I could have anything from a SAN to make it better for a database, I'd ask the vendors to write better drivers. Most of the drivers I've seen and worked with just tell the OS the drive is gone when the SAN disconnects, instead of perhaps telling it to wait or trying to free the process," Eisenshmidt says.
Some vendors seem to be listening and are increasingly focusing their efforts on getting their database to run better on SANs, and their storage and infrastructure to better support databases. Take, for example, Oracle's Hardware-Assisted Resilient Data (HARD) initiative (see "Oracle's storage push"). Oracle's Hoke explains, "We're working with hardware partners to verify that when Oracle data is transferred to a disk it can be read. This sounds simple, but with storage management and storage virtualization in storage networks, there are lots of cooks in the broth."
Part of Oracle's HARD initiative, EMC's Oracle Double Checksum tool offers the ability to verify Oracle data as it is written to arrays in microcode, ensuring that corrupted data is never written to disk. Competitor Hitachi Data Systems has also announced support for the initiative, with a microcode-based verification of Oracle data in their Lightning 9900V product.
EMC also offers its Symmetrix Optimizer, which makes recommendations to administrators on where to rebalance data-a must for administrators looking for hot spindles. EMC has a close partnership with Oracle, including a dedicated engineering team at Oracle Redwood Shores. Chuck Hollis, EMC's vice president of products and markets says, "We have 12 to 15 engineers inside Oracle, some dedicated to interoperability, and the rest to feature exploitation such as our Double Checksum product and integration between Oracle and our replication products."
Veritas also offers a host of products designed to work with databases and SANs. The company's Database Edition products for Oracle9i, IBM DB2, and Microsoft Exchange, including mirroring, backup, replication and clustering features designed for those databases. According to Veritas' Ward, an upcoming release of the company's storage resource management product, "will map data all the way from a database environment in Oracle or Exchange, through files, all the way down to disks to show you which spindles and extents the data is written to."
BMC offers its PATROL for Storage Management product. BMC's Rogers says, "When you're working with Oracle or Sybase or Microsoft SQL server and even Exchange, we can expose that from a database perspective. For example; we can expose an Oracle instance that is on a Solaris box and attached to a SAN, and see where it is spread about. We can expose that through a topology manager-all of the connections, single points of failure, and the way the database is laid out on disk."
Despite complexities of managing databases running on storage networks, the benefits are clear. For large databases, there's really no better option that a SAN because of the level of performance, consistency, redundancy, and scalability that a storage network provides. Storage and database administrators alike are finding that the only way to support their mission critical applications is to deploy their databases on a storage network. With so much at stake, storage and database vendors will continue to work more in tandem to make their products work better together (see "Vendor independence: Are the trade-offs worth it?").