Published: 08 Aug 2002
A company is only as strong as the data in its databases. And the care, feeding and protection of the company's databases are mission-critical IT services. Question: What's the best way to design network storage for databases? Don't answer too quickly. It's one of those sneaky SAT questions: The most obvious answer is usually wrong.
Tablespaces can be mapped directly to LUNs using either raw partition or file system storage.
Raw partitions vs. file system
As database and storage administrators well know, databases can provide their own filing system - called raw partitions - or use the filing capabilities of a file system. Historically, raw partitions have been perceived to have the best performance characteristics because they involve the least software overhead. Raw partitions function at the storing level and use block I/O commands and protocols such as SCSI.
But database performance is usually not the most important requirement - database availability and recovery is. There's little question that a database design using easy-to-locate storage and file system objects provides more options for protecting and recovering database data than black-box raw partitions do. As a storage or database administrator, you need to ask yourself the classic Clint Eastwood question: "Are you feeling lucky, punk?" If you would like to add an element of control to your luck quotient, then put your databases on a file system.
Using a file system, the database makes byte-range I/O requests of the file system, which in turn manages the block locations of those byte ranges within the files. When using a file system, configure it to match the I/O characteristics of the database. If the byte range I/O size from the database and the block size of the file system are the same, you'll have an efficient 1:1 relationship between byte range database file I/O requests and file system block I/O requests to storage. The block size configuration of the file system is typically done when the file system is first installed, so your planning process should include research to determine the database I/O size you'll use and which file system block sizes are supported by the file system. As a rule of thumb, if you aren't sure the sizes match, then use a slightly larger file system block size.
The file system can be on the same system as the database system, in which case there are no network communications to consider. The file system can also reside on a NAS system or file server, in which case, you want to know precisely how system-to-system communications work. Both NFS and Common Internet File System (CIFS) filing protocols can be used to run a database on a network-mounted file system. If you use NFS, make sure you use Transmission Control Protocol (TCP) as the transport protocol as opposed to User Datagram Protocol (UDP). Unlike TCP, UDP doesn't divide a message into packets (datagrams) that are reassembled at the end of the transmission. Specifically, UDP doesn't provide sequencing of the packets that the data arrives in. This means that the application program that uses UDP must be able to make sure that the entire message has arrived and is in the right order. On the other hand, TCP provides error checking, reordering and retransmission of data, which makes things far less confusing for error recovery.
Of course, when using NFS or CIFS to support databases, make sure that networking equipment is qualified and well-tested. You don't want database I/Os to be dropped by an overloaded TCP/IP switch or router. Although TCP will reorder data - you want to avoid it - if possible. Performance will probably be better if the database system and the system where the file system resides are connected through the same switch. Using subnets and virtual networks to segregate database traffic from other traffic is also a good idea.
Another approach that incorporates aspects of both raw partitions performance and file system management is Quick I/O from Veritas software. In a nutshell, Quick I/O establishes files within the Veritas file system that can be used by databases using raw partition block I/O. In other words, a Quick I/O file is a virtual raw partition in a file system that is used by the database.
Mapping tablespaces to storage
The size of the database and the hours it needs to be live can decide what storage and recovery methods to use. As the size of a database grows, its backup window decreases proportionately. It's not possible on many databases to conduct a cold backup copy; as a result a number of technologies have to be implemented to ensure a hot backup. While the database vendors and backup companies have done a lot of work to make database backup and recovery more reliable, generally speaking it can still be a difficult, dicey process.
Database systems are logically organized as tablespaces. It's well beyond the scope of this article to discuss database organization other than to highlight the fact that tablespaces are the logical containers that hold the information in a database. Storage designs that map tablespaces to unique storage locations or SAN addresses allow storage management tools to be used and processes to be built that don't have to deal with the internal complexity of the database, but instead can work on large pieces of the databases as discrete entities.
Tablespaces can be mapped directly to LUNs using either raw partition or file system storage. To map a tablespace to a LUN using a file system, the file system will be created to manage the LUN's address space, even if it only has a single user file in it (the tablespace file). This way, if each tablespace in a database occupies its own dedicated LUN, block-level storage management tools such as backup and mirroring can be applied to create redundant copies of the tablespace without impacting the performance or operations of other tablespaces in the database.
In general, it's a good idea to leave plenty of storage capacity available for growth of the tablespaces. Even so, there will come a time when a database tablespace runs out of storage capacity and you will need to add more space. This is a situation where you may be better off using larger storage subsystems that supports many LUNs. With larger subsystems, it's easy to assign additional capacity simply by addressing the new LUN through an existing connection in the SAN. If you don't have the ability to add a LUN in an existing subsystem port, then you'll need to use additional switch and subsystem ports. For this reason, it's always a good idea to have capacity expansion ports reserved in your switches.
After the network addressing and connectivity issues are settled, the new storage space can be integrated with the database in one of two basic ways. The first way is to virtualize the storage by adding the new capacity to the old and then grow the size of the file system. Second, you can create an empty, new, larger raw partition or file system and copy all the data into it, turning off the old partition or file system when you are done.
Managing database storage through the log files
Whether or not the database uses a file system, there may still be log files that are written to a file system. These log files are used by database administrators to re-assemble the database should some disaster occur, which could cause the database to stop suddenly without a proper system shutdown process. Log files are also used to update a remote copy of the database by playing back the contents of the log file and updating the remote copy of the database. The key is ensuring that the log file contains all the changes made since the last version was completed and that all log files can be safely stored in case they are needed.
There are usually multiple log files that are used in a regular rotation. When the first one fills, the next one starts to fill and when that one fills, the next one starts, and so on. Eventually, the cycle wraps and the first log file is overwritten. This approach gives administrators time to backup and copy the log files on a regular schedule.
Because log files contain complete update information about the database and because they can be used so many different ways, it's always a good idea to give database log files special attention. One way to do this is to dedicate a LUN to the log files. This makes the log file data much easier to locate and it also allows specialized file and storage management techniques, including special backup processes and mirroring. For instance, the log files can be mirrored to a remote location where they can be applied to a remote copy of the database. If the mirroring process fails, the log files can be copied from backup tapes or replicated through another process to the remote site.
The point is that the detailed tablespace I/O doesn't have to be mirrored as long as the log files can be copied consistently without errors. This can have a huge impact on remote mirroring operations. The time involved in mirroring all tablespace I/O operations can significantly slow down the performance of the database. Instead, mirroring and replicating the log files significantly cuts down on the communications costs as well as maintaining expected performance levels.