When a System Dies; Getting back in operation again.

Wed May 6 13:30:45 UTC 2009

> ... What is the best way to restore the full system?
> Can I use the FreeBSD installation disk in rescue mode?

I experienced such a situation just 2 weeks ago. My primary problem
was that I had to do restore over the network (no attached tape
drives, no external HDDs). I wanted to use ssh to grab the dump from
the backup server, but ended up using netcat which worked great.

Here's basically what I did including backup from the not-yet-dead
machine (note, I used intermediate backup server, but it should be
possible to directly pipe dump to restore):

1. dump -0Laf - / | ssh backup-server "cat > dump.root"
2. boot the new machine from CD disc1 (FreeBSD <7) or livefs disc (FreeBSD >7)
3. create and newfs partitions as explained in this thread (at least
the size of backup, can be larger)
4. go into the rescue (fixit) mode, create mount points for created
partitions (mkdir mnt.root), mount partitions (e.g. mount /dev/da0s1a
/mnt.root), change directory to mount point (cd /mnt.root), configure
NIC (ifconfig)
5. start netcat (nc -l 55555 | restore -rvf -)
6. on backup-server: cat dump.root | nc new-machine 55555
7. repeat for usr and var partitions

Notes:
1. if security is an issue, ssh out from the new machine to the backup
server with port forwarding (ssh -R 55555:localhost:55555
backup-server) and pipe the backup to localhost (cat dump.root | nc
localhost 55555);
my initial idea was to start sshd in fixit mode (see my post to the
list "fixit console with sshd") which turned out to be too much of a
trouble.
2. restore uses TMPDIR to store some temporary files during restore
process; the fixit mode has limited free space and when it gets
exhausted the restore process will fail, so it is a good idea to use
an available partition as a temporary TMPDIR (e.g. export
TMPDIR=/mnt.var while restoring usr partition and later use a
subdirectory of usr as TMPDIR to restore var partition)
3. [IMPORTANT!] after the restore process is over, manually check
restored etc/fstab and etc/rc.conf (currently mounted as
/mnt.root/...) to fix:
a) partition names (e.g. /dev/da0s1a might become /dev/amrd0s1a)
b) ethernet interface names (e.g. em0 might become bge0)
c) IP addresses in case you still have the old box running to avoid IP conflict

You should now be able to safely reboot and log into your new machine.

Regards,
-- 
Nino