"New" ZFS crash on FS (pool?) unmount/export
Thomas Backman
serenity at exscape.org
Thu Jul 9 12:36:28 UTC 2009
On Jul 9, 2009, at 11:01, Thomas Backman wrote:
> On Jun 20, 2009, at 09:11, Thomas Backman wrote:
>
>> I just ran into this tonight. Not sure exactly what triggered it -
>> the box stopped responding to pings at 02:07AM and it has a cron
>> backup job using zfs send/recv at 02:00, so I'm guessing it's
>> related, even though the backup probably should have finished
>> before then... Hmm. Anyway.
>>
>> r194478.
>>
>> kernel trap 12 with interrupts disabled
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 0; apic id = 00
>> fault virtual address = 0x288
>> fault code = supervisor read data, page not present
>> instruction pointer = 0x20:0xffffffff805a4989
>> stack pointer = 0x28:0xffffff803e8b57e0
>> frame pointer = 0x28:0xffffff803e8b5840
>> code segment = base 0x0, limit 0xfffff, type 0x1b
>> = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags = resume, IOPL = 0
>> current process = 57514 (zpool)
>> panic: from debugger
>> cpuid = 0
>> Uptime: 10h22m13s
>> Physical memory: 2027 MB
>>
>> (kgdb) bt
>> #0 doadump () at pcpu.h:223
>> #1 0xffffffff8059c409 in boot (howto=260) at /usr/src/sys/kern/
>> kern_shutdown.c:419
>> #2 0xffffffff8059c85c in panic (fmt=Variable "fmt" is not available.
>> ) at /usr/src/sys/kern/kern_shutdown.c:575
>> #3 0xffffffff801f1377 in db_panic (addr=Variable "addr" is not
>> available.
>> ) at /usr/src/sys/ddb/db_command.c:478
>> #4 0xffffffff801f1781 in db_command (last_cmdp=0xffffffff80c38620,
>> cmd_table=Variable "cmd_table" is not available.
>> ) at /usr/src/sys/ddb/db_command.c:445
>> #5 0xffffffff801f19d0 in db_command_loop () at /usr/src/sys/ddb/
>> db_command.c:498
>> #6 0xffffffff801f3969 in db_trap (type=Variable "type" is not
>> available.
>> ) at /usr/src/sys/ddb/db_main.c:229
>> #7 0xffffffff805ce465 in kdb_trap (type=12, code=0,
>> tf=0xffffff803e8b5730) at /usr/src/sys/kern/subr_kdb.c:534
>> #8 0xffffffff8088715d in trap_fatal (frame=0xffffff803e8b5730,
>> eva=Variable "eva" is not available.
>> ) at /usr/src/sys/amd64/amd64/trap.c:847
>> #9 0xffffffff80887fb2 in trap (frame=0xffffff803e8b5730) at /usr/
>> src/sys/amd64/amd64/trap.c:345
>> #10 0xffffffff8086e007 in calltrap () at /usr/src/sys/amd64/amd64/
>> exception.S:223
>> #11 0xffffffff805a4989 in _sx_xlock_hard (sx=0xffffff0043557d50,
>> tid=18446742975830720512, opts=Variable "opts" is not available.
>> )
>> at /usr/src/sys/kern/kern_sx.c:575
>> #12 0xffffffff805a52fe in _sx_xlock (sx=Variable "sx" is not
>> available.
>> ) at sx.h:155
>> #13 0xffffffff80fe2995 in zfs_freebsd_reclaim () from /boot/kernel/
>> zfs.ko
>> #14 0xffffffff808cefca in VOP_RECLAIM_APV (vop=0xffffff0043557d38,
>> a=0xffffff0043557d50) at vnode_if.c:1926
>> #15 0xffffffff80626f6e in vgonel (vp=0xffffff00437a7938) at
>> vnode_if.h:830
>> #16 0xffffffff8062b528 in vflush (mp=0xffffff0060f2a000,
>> rootrefs=0, flags=0, td=0xffffff0061528000)
>> at /usr/src/sys/kern/vfs_subr.c:2450
>> #17 0xffffffff80fdd3a8 in zfs_umount () from /boot/kernel/zfs.ko
>> #18 0xffffffff8062420a in dounmount (mp=0xffffff0060f2a000,
>> flags=1626513408, td=Variable "td" is not available.
>> )
>> at /usr/src/sys/kern/vfs_mount.c:1287
>> #19 0xffffffff80624975 in unmount (td=0xffffff0061528000,
>> uap=0xffffff803e8b5c00)
>> at /usr/src/sys/kern/vfs_mount.c:1172
>> #20 0xffffffff8088783f in syscall (frame=0xffffff803e8b5c90) at /
>> usr/src/sys/amd64/amd64/trap.c:984
>> #21 0xffffffff8086e290 in Xfast_syscall () at /usr/src/sys/amd64/
>> amd64/exception.S:364
>> #22 0x000000080104e49c in ?? ()
>> Previous frame inner to this frame (corrupt stack?)
>
>
> Here's the script in its relevant entirety [...]
>
> #!/bin/bash
>
> PATH="$PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/
> sbin"
>
> function die() {
> echo "$@" 2>&1
> zpool export slave 2>&1 > /dev/null
> ggatel destroy -u 666 2>&1 > /dev/null
> exit 1
> }
>
> function mount_unmount {
> if [ -z "$1" ]; then
> die 'Invalid argument given to mount_unmount'
> elif [[ "$1" == "mount" ]]; then
> zpool list | grep -q slave
> if [ "$?" = "0" ]; then
> echo Already mounted
> return 0
> fi
> echo Creating ggate device
> ggatel create -u 666 /mnt/backup/chaos/slavefile || die 'Unable to
> create GEOM provider from file'
> echo 'Sleeping for 5 seconds...'
> sleep 5
> echo Importing pool
> zpool import -R /slave slave || die 'Unable to import slave pool'
> elif [[ "$1" == "unmount" ]]; then
> echo Exporting pool
> zpool export slave || die 'Unable to export slave pool'
> ggatel destroy -u 666 || die 'Unable to destroy GEOM provider'
> fi
> }
>
> f [ ! -z "$1" ]; then
> case $1 in
> mount) mount_unmount mount; exit 0;;
> unmount) mount_unmount unmount; exit 0;;
> initial) initial; exit 0;;
> backup) ;;
> *) help;;
> esac
> else
> help
> fi
>
> if [ ! -f "/mnt/backup/chaos/slavefile" ]; then
> echo 'Backup error! slavefile does not exist!' | mail -s "Backup
> error" root
> echo 'Slavefile does not exist!'
> exit 1
> fi
>
> mount_unmount mount
>
> CURR=$(date +"backup-%F-%H%M")
>
> echo Taking snapshots
> zfs snapshot -r tank@$CURR || die 'Unable to create $CURR snapshot'
>
> echo Starting backup...
> LAST=$(cat /root/.last-backup)
> zfs send -R -I $LAST tank@$CURR | zfs recv -Fvd slave
>
> echo $CURR > /root/.last-backup
>
> mount_unmount unmount
>
> echo Running rsync
> rsync -av --delete /bootdir/boot exscape::backup-freebsd/chaos
> rsync -av --delete /root exscape::backup-freebsd/chaos
> rsync -av --delete ~serenity exscape::backup-freebsd/chaos
>
> echo 'All done!'
>
> -------------------
Sorry for the monologue, but I ran in to this again, this time with a
panic again. Similar but not identical to the old one.
OK, so I figured I would update my "untouched" src clone (used to save
bandwidth from the FBSD SVN server when I feel I need to start with a
*clean* source tree), now that there have been quite a few changes
since that revision.
I pretty much did this (if other shells are different, !$ in bash is
the last argument to the previous command.)
1) Clean up /usr/src from "my" stuff
2) svn update
3) svn diff and svn status, to make sure it was clean
4) zfs promote tank/usr/src ## usr/src was a clone of the untouched,
read-only fs "tank/usr/src_r194478-UNTOUCHED"
5) zfs destroy -r tank/usr/src_r194478-UNTOUCHED ## The old one,
obviously
6) zfs snapshot tank/usr/src at r195488_UNTOUCHED
7) zfs clone !$ tank/usr/src_r195488-UNTOUCHED
8) zfs promote !$
9) zfs set readonly=on !$
10) And, in case it may matter, I slightly modified the contents of /
usr/src afterwards (applied two patches that aren't merged into HEAD
(yet?)).
... I THINK that's it. Since my bash_history got killed in the panic,
I may be slighty wrong. In any case, usr/src is now a clone of the
readonly UNTOUCHED fs:
[root at chaos ~]# zfs get origin tank/usr/src
NAME PROPERTY
VALUE SOURCE
tank/usr/src origin tank/usr/
src_r195488_UNTOUCHED at r195488_UNTOUCHED -
I then ran the backup script just posted in this thread:
[root at chaos ~]# bash backup.sh backup
Creating ggate device
Sleeping for 5 seconds...
Importing pool
Taking snapshots
Starting backup...
attempting destroy slave/usr/src_r194478-UNTOUCHED at backup-20090709-1250
success
attempting destroy slave/usr/src_r194478-UNTOUCHED at r194478-UNTOUCHED
failed - trying rename to slave/usr/src_r194478-UNTOUCHED at recv-38883-1
failed (2) - will try again on next pass
attempting destroy slave/usr/src_r194478-UNTOUCHED at backup-20090709-1235
success
attempting destroy slave/usr/src_r194478-UNTOUCHED
failed - trying rename to slave/recv-38883-2
failed (2) - will try again on next pass
promoting slave/usr/src
another pass:
attempting destroy slave/usr/src_r194478-UNTOUCHED
success
attempting destroy slave/usr/src at r194478-UNTOUCHED
success
attempting rename slave/usr/src to slave/usr/src_r195488_UNTOUCHED
success
receiving incremental stream of tank at backup-20090709-1328 into
slave at backup-20090709-1328
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of tank/tmp at backup-20090709-1328 into
slave/tmp at backup-20090709-1328
received 119KB stream in 1 seconds (119KB/sec)
receiving incremental stream of tank/var at backup-20090709-1328 into
slave/var at backup-20090709-1328
received 211KB stream in 1 seconds (211KB/sec)
receiving incremental stream of tank/var/log at backup-20090709-1328 into
slave/var/log at backup-20090709-1328
received 468KB stream in 1 seconds (468KB/sec)
receiving incremental stream of tank/var/crash at backup-20090709-1328
into slave/var/crash at backup-20090709-1328
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of tank/root at backup-20090709-1328 into
slave/root at backup-20090709-1328
received 156KB stream in 1 seconds (156KB/sec)
receiving incremental stream of tank/usr at backup-20090709-1328 into
slave/usr at backup-20090709-1328
received 302KB stream in 1 seconds (302KB/sec)
receiving incremental stream of tank/usr/obj at backup-20090709-1328 into
slave/usr/obj at backup-20090709-1328
received 8.52MB stream in 8 seconds (1.07MB/sec)
receiving incremental stream of tank/usr/
src_r195488_UNTOUCHED at r195488_UNTOUCHED into slave/usr/
src_r195488_UNTOUCHED at r195488_UNTOUCHED
received 112MB stream in 43 seconds (2.60MB/sec)
receiving incremental stream of tank/usr/
src_r195488_UNTOUCHED at backup-20090709-1328 into slave/usr/
src_r195488_UNTOUCHED at backup-20090709-1328
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of tank/usr/ports at backup-20090709-1328
into slave/usr/ports at backup-20090709-1328
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of tank/usr/ports/
distfiles at backup-20090709-1328 into slave/usr/ports/
distfiles at backup-20090709-1328
received 312B stream in 1 seconds (312B/sec)
found clone origin slave/usr/src_r195488_UNTOUCHED at r195488_UNTOUCHED
receiving incremental stream of tank/usr/src at backup-20090709-1328 into
slave/usr/src at backup-20090709-1328
received 216KB stream in 1 seconds (216KB/sec)
local fs slave/usr/src does not have fromsnap (backup-20090709-1250 in
stream); must have been deleted locally; ignoring
Exporting pool
Read from remote host 192.168.1.10: Operation timed out
... and the DDB output (first part copy/paste from kgdb, second part
handwritten since the kgdb BT was totally broken, as I expected.)
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x70
fault code = supervisor write data, page not present
instruction pointer = 0x20:0xffffffff8036e855
stack pointer = 0x28:0xffffff803ea637d0
frame pointer = 0x28:0xffffff803ea637e0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 38905 (zpool)
_sx_slock()
dmu_buf_update_user()+0x47
zfs_znode_dmu_fini()
zfs_freebsd_reclaim()
VOP_RECLAIM_APV()
vgonel()
vflush()
zfs_umount()
dounmount()
unmount()
syscall()
Xfast_syscall()
BTW, what's with the "local fs slave/usr/src does not have fromsnap
(backup-20090709-1250 in stream); must have been deleted locally;
ignoring" part?
This is what I get when I try an incremental backup now:
[root at chaos ~]# bash backup.sh backup
Creating ggate device
Sleeping for 5 seconds...
Importing pool
Taking snapshots
Starting backup...
local fs slave/usr/src does not have fromsnap (backup-20090709-1250 in
stream); must have been deleted locally; ignoring
receiving incremental stream of tank at backup-20090709-1328 into
slave at backup-20090709-1328
snap slave at backup-20090709-1328 already exists; ignoring
local fs slave/usr/src does not have fromsnap (backup-20090709-1250 in
stream); must have been deleted locally; ignoring
warning: cannot send 'tank/tmp at backup-20090709-1328': Broken pipe
warning: cannot send 'tank/tmp at backup-20090709-1406': Broken pipe
Exporting pool
Running rsync
... rsync runs, no panic, but no ZFS backup, either. Guess it's time
for another "initial" backup, i.e. start all over with 0 snapshots...
The initial backup worked just fine, it found the clone/origin etc.
and made it work.
Stripped from comments and echo statements, the function is simply:
function initial {
for BACK in $(zfs list -t snapshot -H -r tank | awk '{print
$1}'); do zfs destroy $BACK; done
zpool destroy slave 2>/dev/null; sleep 3; ggatel destroy -u 666
2>/dev/null; sleep 3 # Clean up if needed
ggatel create -u 666 /mnt/backup/chaos/slavefile; sleep 3
zpool create -f -R /slave slave ggate666 && NOW=$(date +"backup-%Y
%m%d-%H%M") || die 'Unable to create pool'
zfs snapshot -r tank@$NOW || die 'Unable to snapshot'
zfs send -R tank@$NOW | zfs recv -vFd slave
mount_unmount unmount
echo $NOW > /root/.last-backup
}
After that, incrementals are fine again.
Regards,
Thomas
More information about the freebsd-current
mailing list