bin/157244: dump/restore: unknown tape header type -230747966
Gene Stark
gene at home.starkeffect.com
Sun May 22 13:10:22 UTC 2011
I wrote a program to compare the blocks in another copy of one of the
large files in the dump with the version extracted from restore after
applying my header reordering program. The program read each of the
files in blocks of TP_BSIZE bytes, computed the SHA1 hash of each
block, stored the resulting <hash, offset> pairs in a hash map for
each file, unioned the key sets of the two hash maps to obtain a
single master list of block hashes, traversed the master key set
to construct a map <offset, <offset0, offset1>> that gave the
correspondence between the blocks in the two files, and printed out
the contents of that map in increasing order of offset, showing the
differences between the two files. Here is the initial part of the
result:
Lectures.zip.bad: 52469795 bytes
Lectures.zip.good: 52469795 bytes
11612 11622 10
11613 11623 10
11614 11624 10
11615 11625 10
11616 11626 10
11617 11627 10
11618 11628 10
11619 11629 10
11620 11630 10
11621 11631 10
11622 11632 10
11623 11633 10
11624 11634 10
11625 11635 10
11626 11636 10
11627 11637 10
11628 11638 10
11629 11639 10
11630 11640 10
11631 11641 10
11632 11612 -20
11633 11613 -20
11634 11614 -20
11635 11615 -20
11636 11616 -20
11637 11617 -20
11638 11618 -20
11639 11619 -20
11640 11620 -20
11641 11621 -20
11642 11652 10
11643 11653 10
11644 11654 10
11645 11655 10
11646 11656 10
11647 11657 10
11648 11658 10
11649 11659 10
11650 11660 10
11651 11661 10
11652 11662 10
11653 11663 10
11654 11664 10
11655 11665 10
11656 11666 10
11657 11667 10
11658 11668 10
11659 11669 10
11660 11670 10
11661 11671 10
11662 11642 -20
11663 11643 -20
11664 11644 -20
11665 11645 -20
11666 11646 -20
11667 11647 -20
11668 11648 -20
11669 11649 -20
11670 11650 -20
11671 11651 -20
11672 11682 10
11673 11683 10
The pattern repeats this way for *almost* the entire file.
There are sets of 20 blocks that occur 10 blocks ahead of the
corresponding blocks in the other file, and then a set of 10
blocks that occur 20 blocks behind the corresponding blocks
in the other file. There are occasional values of 9 and 19
for the differences, which I don't have a ready explanation for,
except that my header reordering relied on the magic number
to identify the header blocks and it is possible there were
a few blocks that were misidentified as headers that were actually
data blocks. At the end of the files there are a few blocks
that do not correspond; these are probably due to alignment
at the end which caused some of the last data blocks to be used
as the first blocks for the next file in the dump.
To test my suspicion that it is a concurrency issue in dump,
I recompiled dump after setting #define SLAVES 1 in tape.c
(rather than the value 3 it had before). I then was able to
complete two rounds of "dump 0f - /mail | restore rfN -"
without any errors, whereas if I use /sbin/dump it fails out
very quickly as indicated in the original PR.
I am not familiar with the locking features, etc. being used in
dump, so I don't know if I will be able to go farther than this
with a reasonable expenditure of time. However, I strongly
suggest that the "concurrency modifications" in dump be turned
off (perhaps by setting SLAVES to 1 as I did) until somebody
can get to the bottom of this. If this is happening to me,
then I suspect there are *massive* numbers of bad dumps out there
that people think are actually good. It will really be a rude
awakening when people try to read them back. Since the data
blocks don't contain any tape address information in them,
it is not possible to recover.
More information about the freebsd-bugs
mailing list