Demystifying Unix dump
dump is a powerful tool to back up Unix files. In this excerpt from W. Curtis Preston's new book, Backup & Recovery: Inexpensive Backup Solutions for Open Systems, the dump utility is described in detail, including how it works, when to use it and exactly what can go wrong at various stages of the dump backup process.
dump is a powerful tool to back up Unix files. However, the dump utility isn't intuitive and can produce some unexpected...
Continue Reading This Article
Enjoy this article as well as all of our content, including E-Guides, news, tips and more.
results, especially during a restore. Here's how dump works.
Using the dump utility to back up Unix-based files can be a tricky undertaking. The following excerpt from W. Curtis Preston's new book, Backup & Recovery: Inexpensive Backup Solutions for Open Systems, explains how dump works and tells what can go wrong at various stages of the dump backup process.
cpio, ntbackup, and tar are filesystem-based utilities that access files through the filesystem. If a backup file is changed, deleted, or added during a backup, usually the worst thing that can happen is that the contents of the individual file that changed will be corrupt. Unfortunately, there is one huge disadvantage to backing up files through the filesystem: the backup affects inode times--in a Unix-based operating system, an inode is a stored description of an individual file--(atime or ctime).
![]() |
Backup & Recovery: Inexpensive Backup Solutions for Open Systems by W. Curtis Preston ISBN: 0-596-10246-1 Copyright 2007 O'Reilly Media Inc. Used with the permission of O'Reilly Media Inc. Available from booksellers or direct from O'Reilly Media at www.oreilly.com/catalog |
dump, on the other hand, does not access files through the Unix filesystem, so it doesn't have this limitation. It backs up files by accessing the data through the raw device driver. Exactly how dump works is not well known. The dump manpage doesn't help matters either, since it creates FUD (fear, uncertainty, and doubt). For example, Sun's ufsdump manpage says:
When running ufsdump, the filesystem must be inactive; otherwise, the output of ufsdump may be inconsistent and restoring files correctly may be impossible. A filesystem is inactive when it is unmounted [sic] or the system is in single user mode.
From this warning, it isn't clear what the extent of the problem is if the advice isn't heeded. Is it individual files in the dump that may be corrupted? Is it entire directories? Is it everything beyond a certain point in the dump or the entire dump? Do we really have to dismount the filesystem to get a consistent dump?
@pb
Dumpster diving
The dump utility is very filesystem-specific, so there may be slight variations in how it works on various Unix platforms. For the most part, however, the following description should cover how it works because most versions of dump are generally derived from the same code base. Let's first look at the output from a real dump. We're going to look at an incremental backup because it has more interesting messages than a level-0 backup:
# /usr/sbin/ufsdump 9bdsfnu 64 80000 150000 /dev/null /
-
DUMP: Writing 32 Kilobyte records
DUMP: Date of this level 9 dump: Mon Feb 15 22:41:57 2006
DUMP: Date of last level 0 dump: Sat Aug 15 23:18:45 2005
DUMP: Dumping /dev/rdsk/c0t3d0s0 (sun:/) to /dev/null.
DUMP: Mapping (Pass I) [regular files]
DUMP: Mapping (Pass II) [directories]
DUMP: Mapping (Pass II) [directories]
DUMP: Mapping (Pass II) [directories]
DUMP: Estimated 56728 blocks (27.70MB) on 0.00 tapes.
DUMP: Dumping (Pass III) [directories]
DUMP: Dumping (Pass IV) [regular files]
DUMP: 56638 blocks (27.66MB) on 1 volume at 719 KB/sec
DUMP: DUMP IS DONE
DUMP: Level 9 dump on Mon Feb 15 22:41:57 2006
In this example, ufsdump makes four main passes to back up a filesystem; note that Pass II was performed three times. Here's what dump did during each pass.
Pass I
Based on the entries in the dumpdates file (usually /etc/dumpdates) and the dump level specified on the command line, an internal variable named DUMP_SINCE is calculated. Any file modified after the DUMP_SINCE time is a candidate for the current dump. dump then scans the disk and looks at all inodes in the filesystem. Note that dump "understands" the layout of the Unix filesystem and reads all of its data through the raw disk device driver.
Unallocated inodes are skipped. The modification times of allocated inodes are compared to DUMP_SINCE. Modification times of files greater than or equal to DUMP_SINCE are candidates for backup; the rest are skipped. While looking at the inodes, dump builds:
- A list of file inodes to back up
- A list of directory inodes seen
- A list of used (allocated) inodes
Pass IIa
dump rescans all the inodes and specifically looks at directory inodes that were found in Pass I to determine whether they contain any of the files targeted for backup. If not, the directory's inode is dropped from the list of directories that need to be backed up.
Pass IIb
By deleting in Pass IIa directories that do not need to be backed up, the parent directory may now qualify for the same treatment on this or a later pass, using this algorithm. This pass is a rescan of all directories to see if the remaining directories in the directory inode list now qualify for removal.
Pass IIc
Directories were dropped in Pass IIb. Perform another scan to check for additional directory removals. This ends up being the final Pass II scan because no more directories can be dropped from the directory inode list. (If additional directories had been found that could be dropped, another Pass II scan would have occurred.)
Pre-Pass III
This is when dump actually starts to write data. Just before Pass III officially starts, dump writes information about the backup. dump writes all data in a very structured manner. Typically, dump writes a header to describe the data that is about to follow, and then the data is written. Another header is written and then more data. During the Pre-Pass III phase, dump writes a dump header and two inode maps. Logically, the information would be written sequentially, like this:
header
TS_TAPE dump header
header
TS_CLRI
usedinomap
A map of inodes deleted since the last dump
header
TS_BITS
dumpinomap
A map of inodes in the dump
@pb
The map usedinomap is a list of inodes that have been deleted since the last dump. restore uses this map to delete files before doing a restore of files in this dump. The map dumpinomap is a list of all inodes contained in this dump. Each header contains a lot of information:
-
Record type
Dump date
Volume number
Logical block of record
Inode number
Magic number
Record checksum
Inode
Number of records to follow
Dump label
Dump level
Name of dumped filesystem
Name of dumped device
Name of dumped host
First record on volume
TS_TAPE
dump header
TS_CLRI
Map of inodes deleted since last dump
TS_BITS
Map of inodes in dump
TS_INODE
Beginning of file record
TS_ADDR
Continuation of file record
TS_END
End of volume marker
When dump writes the header, it includes a copy of the inode for the file or directory that immediately follows the header. Since inode data structures have changed over the years, and different filesystems use slightly different inode data structures for their respective filesystems, this would create a portability problem. So dump normalizes its output by converting the current filesystem's inode data structure into the old BSD inode data structure. This BSD data structure is written to the backup volume.
As long as all dump programs do this, then you should be able to restore the data on any Unix system that expects the inode data structure to be in the old BSD format. It is for this reason you can interchange a dump volume written on Solaris, HP-UX, and AIX systems.
@pb
Five key questions about dump The following questions are the ones most asked concerning the dump Unix backup utility. Question 1
|
@pb
Five key questions about dump (continued) A file is moved in the filesystem. Again, there are a few scenarios:
If we dump an active filesystem, will data corruption affect directories? A: Possibly. Most of the details outlined for files also apply to directories. The one exception is that directories are dumped in Pass III instead of Pass IV, so the time frames for changes to directories will change. This also implies that changes to directories are less susceptible to corruption because the time that elapses between the generation of the directory list and the dump of that list is less. However, changes to files that would normally cause corresponding changes to the directory information will still create inconsistencies in the dump. Question 3
|
@pb
Five key questions about dump (continued) Question 4 Question 5
|
Pass III
This is when real disk data starts to get dumped. During Pass III, dump writes only those directories that contain files that have been marked for backup. As in the Pre-Pass III phase, during Pass III, dump logically writes data something like this:
-
Header (TS_INODE)
Disk blocks (directory block[s])
Header (TS_ADDR)
Disk blocks (more directory block[s])
.
.
.
Header (TS_ADDR)
Disk blocks (more directory block[s])
Repeat the previous four steps for each directory in the list of directory inodes to back up.
Pass IV
Finally, file data is dumped. During Pass IV, dump writes only those files that were marked for backup. dump logically writes data during this pass as it did in Pass III for directory data:
-
Header (TS_INODE)
Disk blocks (file block[s])
Header (TS_ADDR)
Disk blocks (more file block[s])
.
.
.
Header (TS_ADDR)
Disk blocks (more file block[s])
Repeat the previous four steps for each file in the list of file inodes to back up.
To mark the end of the backup, dump writes a final header using the TS_END record type. This header officially marks the end of the dump.
Summary of dump steps
The following is a summary of each of dump's steps:
-
Pass I
dump builds a list of the files it is going to back up.
Pass II
dump scans the disk multiple times to determine a list of the directories it needs to back up.
Pre-Pass III
dump writes a dump header and two inode maps.
Pass III
dump writes a header (which includes the directory inode) and the directory data blocks for each directory in the directory backup list.
Pass IV
dump writes a header (which includes the file inode), and the file data blocks for each file in the file backup list.
Post-Pass IV
dump writes a final header to mark the end of the dump.
As described earlier, using dump to back up a mounted filesystem can dump files that are found to be corrupt when restored. The likelihood of that occurring rises as the activity of the filesystem increases. There are also situations where data is backed up safely, but the information in the dump is inconsistent. For these inconsistencies to occur, certain events have to occur at the right time during the dump. And it is possible that the wrong file is dumped during the backup; if that file is restored, the admin will wonder how that happened!
The potential for data corruption to occur is low but still a possibility. For most people, dumping live filesystems that are fairly idle produces a good backup. Generally, you'll have similar success or failure performing a backup with dump as with tar or cpio.
Start the conversation
0 comments