Demystifying Unix dump

dump is a powerful tool to back up Unix files. In this excerpt from W. Curtis Preston's new book, Backup & Recovery: Inexpensive Backup Solutions for Open Systems, the dump utility is described in detail, including how it works, when to use it and exactly what can go wrong at various stages of the dump backup process.

This Content Component encountered an error
This article can also be found in the Premium Editorial Download: Storage magazine: Learning data retention lessons from Warner Bros.:

dump is a powerful tool to back up Unix files. However, the dump utility isn't intuitive and can produce some unexpected results, especially during a restore. Here's how dump works.


Using the dump utility to back up Unix-based files can be a tricky undertaking. The following excerpt from W. Curtis Preston's new book, Backup & Recovery: Inexpensive Backup Solutions for Open Systems, explains how dump works and tells what can go wrong at various stages of the dump backup process.

cpio, ntbackup, and tar are filesystem-based utilities that access files through the filesystem. If a backup file is changed, deleted, or added during a backup, usually the worst thing that can happen is that the contents of the individual file that changed will be corrupt. Unfortunately, there is one huge disadvantage to backing up files through the filesystem: the backup affects inode times--in a Unix-based operating system, an inode is a stored description of an individual file--(atime or ctime).




Backup & Recovery: Inexpensive Backup Solutions for Open Systems
by W. Curtis Preston

ISBN: 0-596-10246-1
Copyright 2007 O'Reilly Media Inc. Used with the permission of O'Reilly Media Inc.

Available from booksellers or direct from O'Reilly Media at www.oreilly.com/catalog


dump, on the other hand, does not access files through the Unix filesystem, so it doesn't have this limitation. It backs up files by accessing the data through the raw device driver. Exactly how dump works is not well known. The dump manpage doesn't help matters either, since it creates FUD (fear, uncertainty, and doubt). For example, Sun's ufsdump manpage says:

When running ufsdump, the filesystem must be inactive; otherwise, the output of ufsdump may be inconsistent and restoring files correctly may be impossible. A filesystem is inactive when it is unmounted [sic] or the system is in single user mode.

From this warning, it isn't clear what the extent of the problem is if the advice isn't heeded. Is it individual files in the dump that may be corrupted? Is it entire directories? Is it everything beyond a certain point in the dump or the entire dump? Do we really have to dismount the filesystem to get a consistent dump?

@pb

Dumpster diving
The dump utility is very filesystem-specific, so there may be slight variations in how it works on various Unix platforms. For the most part, however, the following description should cover how it works because most versions of dump are generally derived from the same code base. Let's first look at the output from a real dump. We're going to look at an incremental backup because it has more interesting messages than a level-0 backup:

# /usr/sbin/ufsdump 9bdsfnu 64 80000 150000 /dev/null /

    DUMP: Writing 32 Kilobyte records
    DUMP: Date of this level 9 dump: Mon Feb 15 22:41:57 2006
    DUMP: Date of last level 0 dump: Sat Aug 15 23:18:45 2005
    DUMP: Dumping /dev/rdsk/c0t3d0s0 (sun:/) to /dev/null.
    DUMP: Mapping (Pass I) [regular files]
    DUMP: Mapping (Pass II) [directories]
    DUMP: Mapping (Pass II) [directories]
    DUMP: Mapping (Pass II) [directories]
    DUMP: Estimated 56728 blocks (27.70MB) on 0.00 tapes.
    DUMP: Dumping (Pass III) [directories]
    DUMP: Dumping (Pass IV) [regular files]
    DUMP: 56638 blocks (27.66MB) on 1 volume at 719 KB/sec
    DUMP: DUMP IS DONE
    DUMP: Level 9 dump on Mon Feb 15 22:41:57 2006

In this example, ufsdump makes four main passes to back up a filesystem; note that Pass II was performed three times. Here's what dump did during each pass.

Pass I
Based on the entries in the dumpdates file (usually /etc/dumpdates) and the dump level specified on the command line, an internal variable named DUMP_SINCE is calculated. Any file modified after the DUMP_SINCE time is a candidate for the current dump. dump then scans the disk and looks at all inodes in the filesystem. Note that dump "understands" the layout of the Unix filesystem and reads all of its data through the raw disk device driver.

Unallocated inodes are skipped. The modification times of allocated inodes are compared to DUMP_SINCE. Modification times of files greater than or equal to DUMP_SINCE are candidates for backup; the rest are skipped. While looking at the inodes, dump builds:

  • A list of file inodes to back up
  • A list of directory inodes seen
  • A list of used (allocated) inodes
@pb

Pass IIa
dump rescans all the inodes and specifically looks at directory inodes that were found in Pass I to determine whether they contain any of the files targeted for backup. If not, the directory's inode is dropped from the list of directories that need to be backed up.

Pass IIb
By deleting in Pass IIa directories that do not need to be backed up, the parent directory may now qualify for the same treatment on this or a later pass, using this algorithm. This pass is a rescan of all directories to see if the remaining directories in the directory inode list now qualify for removal.

Pass IIc
Directories were dropped in Pass IIb. Perform another scan to check for additional directory removals. This ends up being the final Pass II scan because no more directories can be dropped from the directory inode list. (If additional directories had been found that could be dropped, another Pass II scan would have occurred.)

Pre-Pass III
This is when dump actually starts to write data. Just before Pass III officially starts, dump writes information about the backup. dump writes all data in a very structured manner. Typically, dump writes a header to describe the data that is about to follow, and then the data is written. Another header is written and then more data. During the Pre-Pass III phase, dump writes a dump header and two inode maps. Logically, the information would be written sequentially, like this:

header
     TS_TAPE dump header
header
     TS_CLRI
usedinomap
     A map of inodes deleted since the last dump
header
     TS_BITS
dumpinomap
     A map of inodes in the dump

@pb

The map usedinomap is a list of inodes that have been deleted since the last dump. restore uses this map to delete files before doing a restore of files in this dump. The map dumpinomap is a list of all inodes contained in this dump. Each header contains a lot of information:

    Record type
    Dump date
    Volume number
    Logical block of record
    Inode number
    Magic number
    Record checksum
    Inode
    Number of records to follow
    Dump label
    Dump level
    Name of dumped filesystem
    Name of dumped device
    Name of dumped host
    First record on volume
The record type field describes the information following the header. There are six basic record types:

TS_TAPE
     dump header
TS_CLRI
     Map of inodes deleted since last dump
TS_BITS
     Map of inodes in dump
TS_INODE
     Beginning of file record
TS_ADDR
     Continuation of file record
TS_END
     End of volume marker

When dump writes the header, it includes a copy of the inode for the file or directory that immediately follows the header. Since inode data structures have changed over the years, and different filesystems use slightly different inode data structures for their respective filesystems, this would create a portability problem. So dump normalizes its output by converting the current filesystem's inode data structure into the old BSD inode data structure. This BSD data structure is written to the backup volume.

As long as all dump programs do this, then you should be able to restore the data on any Unix system that expects the inode data structure to be in the old BSD format. It is for this reason you can interchange a dump volume written on Solaris, HP-UX, and AIX systems.

@pb

Five key questions about dump

The following questions are the ones most asked concerning the dump Unix backup utility.

Question 1
If we dump an active filesystem, will data corruption affect individual directories/files in the dump?
A: Yes.

The following three scenarios can occur if your filesystem is changing during a dump:

A file is deleted before Pass I. The file is not included in the backup list because it doesn't exist when Pass I occurs.

A file is deleted after Pass I but before Pass IV. The file may be included in the backup list, but during Pass IV, dump checks to make sure the file still exists and is a file. If either condition is false, dump skips backing it up. However, the inode map written in Pre-Pass III will be incorrect. This inconsistency does not affect the dump, but restore will be unable to recover the file even though it is in the restore list.

The contents of a file marked for backup change (inode number stays the same); there are really two scenarios here. Changing the file at a time when dump is not backing it up does not affect the backup of the file. dump keeps a list of the inode numbers, so changing the file may affect the contents of the inode but not the inode number itself.

Changing the file when dump is backing up the file probably will corrupt the data dumped for the current file. dump reads the inode and follows the disk block pointers to read and then write the file blocks. If the address or contents of just one block changes, the file dumped will be corrupt.

The inode number of a file changes. If the inode number of a file changes after it was put on the backup list (inode changes after Pass I, but before Pass IV), when the time comes to back up the file, one of three scenarios occurs:

  • The inode is not being used by the filesystem, so dump will skip backing up this file. The inode map written in Pre-Pass III will be incorrect. This inconsistency will not affect the dump but will confuse you during a restore (a file is listed but can't be restored).


  • The inode is reallocated by the filesystem and is now a directory, pipe, or socket. dump will see that the inode is not a regular file and ignore the backing up of the inode. Again, the inode map written in Pre-Pass III will be inconsistent.


  • The inode is reallocated by the filesystem and now is used by another file; dump will back up the new file. Even worse, the name of the file dumped in Pass III for that inode number is incorrect. The file actually may be of a file somewhere else in the filesystem. It's like dump trying to back up /etc/hosts but really getting /bin/ls. Although the file is not corrupt in the true sense of the word, if this file were restored, it would not be the correct file.

@pb

Five key questions about dump (continued)

A file is moved in the filesystem. Again, there are a few scenarios:

  • The file is renamed before the directory is dumped in Pass III. When the directory is dumped in Pass III, the new name of the file will be dumped. The backup then proceeds as if the file was never renamed.


  • The file is renamed after the directory is dumped in Pass III. The inode doesn't change, so dump will back up the file. However, the name of the file dumped in Pass III will not be the current filename in the filesystem. This scenario should be harmless.


  • The file is moved to another directory in the same filesystem before the directory was dumped in Pass III. If the inode didn't change, then this is the same as the first scenario.


  • The file is moved to another directory in the same filesystem after the directory was dumped in Pass III. If the inode didn't change, then the file will be backed up, but during a restore it would be seen in the old directory with the old name.


  • The file's inode changes. The file would not be backed up, or another file may be backed up in its place (if another file has assumed this file's old inode).
Question 2
If we dump an active filesystem, will data corruption affect directories?
A: Possibly.

Most of the details outlined for files also apply to directories. The one exception is that directories are dumped in Pass III instead of Pass IV, so the time frames for changes to directories will change.

This also implies that changes to directories are less susceptible to corruption because the time that elapses between the generation of the directory list and the dump of that list is less. However, changes to files that would normally cause corresponding changes to the directory information will still create inconsistencies in the dump.

Question 3
If we dump an active filesystem, will data corruption affect the entire dump or everything beyond a certain point in the dump?
A: No.

Even though dump backs up files through the raw device driver, it is in effect backing up data inode by inode. This is still going through the filesystem and doing it file by file. Corrupting one file will not affect other files in the dump.

@pb

Five key questions about dump (continued)

Question 4
Do we really have to dismount the filesystem to get a consistent dump?
A: No.

There is a high likelihood that dumps of an idle, mounted filesystem will be fine. The more active the filesystem, the higher the risk that corrupt files will be dumped. The risk that files are corrupt is about the same for a utility that accesses files using the filesystem.

Question 5
Will we learn (after it's too late) that our dump of an essentially idle mounted filesystem is corrupt?
A: No.

It's possible that individual files in that dump are corrupt, but highly unlikely that the entire dump is corrupt. Since dump backs up data inode by inode, this is similar to backing up through the filesystem file by file.


Pass III
This is when real disk data starts to get dumped. During Pass III, dump writes only those directories that contain files that have been marked for backup. As in the Pre-Pass III phase, during Pass III, dump logically writes data something like this:

    Header (TS_INODE)
    Disk blocks (directory block[s])
    Header (TS_ADDR)
    Disk blocks (more directory block[s])
    .
    .
    .
    Header (TS_ADDR)
    Disk blocks (more directory block[s])
    Repeat the previous four steps for each directory in the list of directory inodes to back up.
@pb

Pass IV
Finally, file data is dumped. During Pass IV, dump writes only those files that were marked for backup. dump logically writes data during this pass as it did in Pass III for directory data:

    Header (TS_INODE)
    Disk blocks (file block[s])
    Header (TS_ADDR)
    Disk blocks (more file block[s])
    .
    .
    .
    Header (TS_ADDR)
    Disk blocks (more file block[s])
    Repeat the previous four steps for each file in the list of file inodes to back up.
Post-Pass IV
To mark the end of the backup, dump writes a final header using the TS_END record type. This header officially marks the end of the dump.

Summary of dump steps
The following is a summary of each of dump's steps:

    Pass I
    dump builds a list of the files it is going to back up.
    Pass II
    dump scans the disk multiple times to determine a list of the directories it needs to back up.
    Pre-Pass III
    dump writes a dump header and two inode maps.
    Pass III
    dump writes a header (which includes the directory inode) and the directory data blocks for each directory in the directory backup list.
    Pass IV
    dump writes a header (which includes the file inode), and the file data blocks for each file in the file backup list.
    Post-Pass IV
    dump writes a final header to mark the end of the dump.
A final analysis of dump
As described earlier, using dump to back up a mounted filesystem can dump files that are found to be corrupt when restored. The likelihood of that occurring rises as the activity of the filesystem increases. There are also situations where data is backed up safely, but the information in the dump is inconsistent. For these inconsistencies to occur, certain events have to occur at the right time during the dump. And it is possible that the wrong file is dumped during the backup; if that file is restored, the admin will wonder how that happened!

The potential for data corruption to occur is low but still a possibility. For most people, dumping live filesystems that are fairly idle produces a good backup. Generally, you'll have similar success or failure performing a backup with dump as with tar or cpio.

This was first published in August 2007
This Content Component encountered an error

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchSolidStateStorage

SearchVirtualStorage

SearchCloudStorage

SearchDisasterRecovery

SearchDataBackup

Close