Problems with dump and restore

Tue Aug 12 19:34:24 UTC 2014

On Tue, Aug 12, 2014 at 01:07:06PM -0300, Andrew Hamilton-Wright wrote:
> 
> I was attempting to restore my /usr partition today, and have encountered
> some rather terrifying issues using restore.
> 
> 
> Some background ...
> 
> I have used dump/restore for several years, very happily, to maintain
> backups on my machine.
> 
> I have a level 0 dump of each file system, and then a cron-based script
> that does higher level dumps on a regular basis.  I therefore have dumps
> at the following levels for this filesystem at the moment:  0, 2, 3, 5
> 
> These were created using snapshots, so the level 0 was created via
>  	dump 0uLCf 32 - /usr
> and higher level dumps were created similarly.

In 2011, a problem was found with snapshots in combination with soft
updates *and* journaling (SU+J) hanging the machine. At that time the
recommendation was to switch off journaling.
According to https://wiki.freebsd.org/NewFAQs:

    If you want to use snapshot (dump -L) then disable the soft updates
    journal for that filesystem.

This bug was fixed toward the end of 2011;
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=160662

Personally I make dumps *only* from filesystems that are unmounted or mounted
read-only, so never from a “live” filesystem, just to be on the safe side.

> My uname info is:
>  	FreeBSD qemg.org 10.0-RELEASE-p7 FreeBSD 10.0-RELEASE-p7 #0: Tue Jul  8 06:37:44 UTC 2014     root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
> I wanted to restore the /usr partition to the state it was in at the last
> (level 5) backup.  My expected steps to achieve this are:
>      o go to single user (I did this through a full reboot)
>  	o create a replacement filesystem on the drive:
>  		newfs -O 2 -U -a 4 -b 32768 -d 32768 -e 4096 -f 4096 \
>  				-g 16384 -h 64 -i 8192 -k 0 -m 8 -o time \
>  				-s 415236096 /dev/ada0e
>  	o mount the drive as /usr, and change directory to the mount point
>  	o restore the level 0 dump
>  		restore ruf /backup/dumps/current/usr.dump
>  	* this is the first sign of trouble, as restore output the warning
>  		expected next file 19266003, got 19100935

This is mentioned in the restore's manpage;

     expected next file <inumber>, got <inumber>
             A file that was not listed in the directory showed up.  This can
             occur when using a dump created on an active file system.

>  	o restore the level 2 dump
>  		restore ruf /backup/dumps/current/l1d0/l2d0/usr.dump
>  	* this failed, indicating that the restore was corrupt (unfortunately
>  	  I do not have the full text of the errors received, but a complaint
>  	  that an entry was "not a leaf" was in the first message)
> 
> Frankly, this terrifies me.  If dump and restore cannot be trusted
> as a robust backup solution, I don't know where to turn to.
> 
> Some questions then:
> - is anyone else using dump/restore as their main backup method?

Yes, operating system filesystems like /, /usr and /var, which can contain
flags and hard links and such. These filesystem's aren't all that big, so
dumps are relatively quick.

For my large /home filesystem I rather use rsync, because it copies less and
so is much faster.

>    Are you using snapshots?

No, because of the aforementioned bug that surfaced in 2011.

>  If so, have you seen anything like this when running restore?

I've had hangs and corrupted dumps when dumping live filesystems.

> - is there any means of validating the dump file, other than the -N
>    option (which returns no warnings on any of these files)?

Not that I know of. I generally make and verify checksums when copying dumps
to other machines or external harddrives.

> - does anyone have any advice that may help determine what may have
>    gone wrong?

Try using restore's “degraded” mode (using the ‘-D’ option) and use the ‘-y’
option.

Roland
-- 
R.F.Smith                                   http://rsmith.home.xs4all.nl/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 5753 3324 1661 B0FE 8D93  FCED 40F6 D5DC A38A 33E0 (keyID: A38A33E0)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20140812/16426811/attachment.sig>