This article can also be found in the Premium Editorial Download "Storage magazine: Optimizing your enterprise database storage."
Download it now to read this article plus other related content.
One of the driving forces behind the move to storage area networks (SANs) has been the increasing importance and size of enterprise databases. As databases grew, administrators found that the only way to efficiently manage and expand database storage is to tap into the flexibility of SANs. However, with the expanded use of databases on storage networks, storage administrators now must deal with new performance, backup, restore and management issues-not to mention how to get along with the database administrators (DBAs). Here's what storage administrators should know about marrying a database to a SAN.
|Oracle's Storage Push|
In a sign that database vendors are increasingly aware of how important the storage behind their databases is, Oracle recently announced that it is launching a storage initiative, which seeks to combine the company's various individual storage initiatives into a strategic push to focus on how Oracle runs on storage. Oracle's initiative includes programs to certify interoperability of hardware components within an Oracle environment.
According to Allen Hoke, technical director, system platforms division of Oracle, "This program focuses on storage technology at a high level vs. our former focus on specific areas such as backup or particular storage hardware or storage management technology." In addition, Hoke says, "In order to provide a good customer experience, we are teaming with our key OS, server hardware and storage partners to provide a remote support strategy, aligned between us and our partners."
At this time, Oracle's partners include EMC, HP, Hitachi, Network Appliance, Legato, Sun, and Veritas.
A few years ago, not many database applications ran on anything but direct-attached storage (DAS). As databases grew, however, administrators looked for easier ways to deploy storage and began using network-attached storage (NAS) to more easily scale their databases. With the advent of SANs, administrators are now finding that storage networks can provide even better performance for the most demanding transaction-oriented, high volume databases.
Today, database applications are starting to move from DAS to SANs. The reasons: increased ease and speed of backup, better scalability, as well as enhanced reliability and performance. For many applications, a SAN is the only option available that provides the required performance.
Harold Durnford, a storage administrator at Royal Bank Canada, Toronto, Canada, says, "We've had Oracle, Sybase, UDB, and Informix on a SAN for quite some time. The reason we looked at storage area networks was the challenge of sharing large storage subsystems." In addition, he says, "It wasn't just a distance limitation, but also real estate. With SCSI, we found we quickly ran out of SCSI ports on our arrays before we ran out of storage. We used the SAN to address fan-out."
Marty Ward, vice president of product management at Veritas has also seen this, and says, "Database users are very focused on scaling up and scaling out, which you can't do with dedicated storage."
Bob Rogers, chief storage architect at BMC says, "Consistent level of performance is driving people to storage area networks, as are better quality backup and recovery tools. Those are all very strong motivators to go to a SAN environment, which is why very large customers have embraced SANs and put databases on them quite early."
Jim Booth, director of systems engineering at CreekPath, agrees, saying, "People are moving to SANs primarily to get the performance. You can certainly implement a database on either NAS or a SAN. However, if you're doing it on a SAN, you get consistent throughput. Though it might be harder to manage, a SAN provides better performance and availability."
A typical SAN configured for a database application includes database servers clustered for reliability and attached to a central network. At a minimum, two separate database servers are clustered together for redundancy, with dual redundant Fibre Channel (FC) fabrics connected to redundant RAID controllers. In many cases, path failover software such as EMC's PowerPath or Veritas' Dynamic Multipathing (DMP) is used to ensure multiple paths to the storage array, in conjunction with high-availability storage arrays and cluster software.
Michael Leo, a database administrator at Minneapolis, MN-based Caribou Lake Software, a consulting firm which helps companies manage their systems, works with a large medical devices manufacturer replacing legacy Alpha VMS systems with Solaris machines, connected to an EMC Symmetrix SAN. "Everything is striped and mirrored [1+0]. We are using the SAN for development, production online transaction processing [OLTP] and production reporting-a data warehouse of the production database used for reporting. Each database is in the 100GB to 200GB range. We have two copies of the OLTP database-one is a replicated hot standby, " says Leo. The systems run Computer Associates' Ingres database.
"To back up the database, we are using the SAN for disk-to-disk backups. We then use Veritas' NetBackup and a separate backup LAN to backup the SAN. To manage the disk space on the Sun, we use Veritas in combination with EMC's SAN management tools," says Leo.
Optimizing database performance has traditionally been done by configuring how and where tables and logs are stored, as well as configuring disks and storage for database access patterns. Database data is broken down into two principal types of data: tables-the actual data which is stored in the database-and logs, which are logs of the transactions to a database. Transaction logs are critical for recovering a database if there's a failure and are also used for replicating databases. Many databases can also use their own native file systems to maximize performance, maintain control over tables and avoid potential performance penalties imposed by file system buffering. Aside from fast storage, CPU and memory, administrators have paid a lot of attention to balancing speed vs. redundancy of data.
Database and storage administrators have carefully selected RAID configurations-for example, using RAID 0+1-both mirroring and striping-in order to gain the greatest read performance from their hardware without losing redundancy. While RAID 5 provides for redundancy at a somewhat lower cost, most database vendors today recommend RAID 0+1 for tables as well as logs and only recommend RAID 5 for less performance-critical backup volumes. Caribou Lake's Leo says, "Performance is highly dependent upon how you configure the SAN. We have seen very good performance for databases with RAID 1+0 type configurations. We have also seen decent performance for RAID 5, unless the RAID 5 is in rebuild. If the RAID 5 is in rebuild, write performance is so bad, the system may as well be down."
To further optimize access times and data throughput, database administrators have to also make sure that tables and logs never share the same spindles on a disk. John Eisenschmidt, a DBA at the American Association for the Advancement of Science, Washington, D.C., says for his last install of Oracle, "We made sure not to put the index tablespaces with the data tablespaces to improve read performance, and put the archive logs and redo logs in different places to balance write operations." Eisenschmidt manages about 2.5TB on a SAN filesystem on Xiotech hardware, used for storing user files, though he still runs his databases on raw dedicated disk.
However, with the advent of SANs and virtualized storage, administrators no longer have that level of control of data placement. According to Bob Rogers, chief technology officer of BMC Software, "Most DBAs are absolutely convinced that file placement is an absolutely critical issue, however, most disks today are virtualized-and users don't have a clue where things are physically located."
Luckily, the speed and manageability of storage have improved the situation, despite the loss in control. Rogers says, "Today, you can get a much more consistent level of performance by going to very large arrays such as those from EMC, IBM, and HDS than you could going to direct-attach disk."
This was first published in November 2002