ZFS panic under extreme circumstances (2/3 disks corrupted)
Thomas Backman
serenity at exscape.org
Sun May 24 19:02:34 UTC 2009
So, I was playing around with RAID-Z and self-healing, when I decided
to take it another step and corrupt the data on *two* disks (well,
files via ggate) and see what happened. I obviously expected the pool
to go offline, but I didn't expect a kernel panic to follow!
What I did was something resembling:
1) create three 100MB files, ggatel create to create GEOM providers
from them
2) zpool create test raidz ggate{1..3}
3) create a 100MB file inside the pool, md5 the file
4) overwrite 10~20MB (IIRC) of disk2 with /dev/random, with dd if=/dev/
random of=./disk2 bs=1000k count=20 skip=40, or so (I now know that I
wanted *seek*, not *skip*, but it still shouldn't panic!)
5) Check if the md5 of file: everything OK, zpool status shows a
degraded pool.
6) Repeat step #4, but with disk 3.
7) zpool scrub test
8) Panic!
FreeBSD chaos.exscape.org 8.0-CURRENT FreeBSD 8.0-CURRENT #2: Thu May
21 22:42:42 CEST 2009 root at chaos.exscape.org:/usr/obj/usr/src/sys/
DTRACE amd64
May 24 09:13:12 chaos root: ZFS: vdev failure, zpool=test
type=vdev.bad_label
May 24 09:13:15 chaos last message repeated 2 times
panic: solaris assert: 0 == zap_add(dp->dp_meta_objset,
DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_SCRUB_FUNC, sizeof (uint32_t), 1,
&dp->dp_scrub_func, tx), file: /usr/src/sys/modules/zfs/../../cddl/
contrib/opensolaris/uts/common/fs/zfs/dsl_scrub.c, line: 122
cpuid = 0
KDB: enter: panic
panic: from debugger
cpuid = 0
Uptime: 22h47m41s
Physical memory: 2028 MB
Dumping 1754 MB: ...
#0 doadump () at pcpu.h:223
223 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) #0 doadump () at pcpu.h:223
#1 0xffffffff80576039 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:420
#2 0xffffffff8057648c in panic (fmt=Variable "fmt" is not available.
)
at /usr/src/sys/kern/kern_shutdown.c:576
#3 0xffffffff801d5b07 in db_panic (addr=Variable "addr" is not
available.
)
at /usr/src/sys/ddb/db_command.c:478
#4 0xffffffff801d5f11 in db_command (last_cmdp=0xffffffff80bd8820,
cmd_table=Variable "cmd_table" is not available.
) at /usr/src/sys/ddb/db_command.c:445
#5 0xffffffff801d6160 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:498
#6 0xffffffff801d80f9 in db_trap (type=Variable "type" is not
available.
) at /usr/src/sys/ddb/db_main.c:229
#7 0xffffffff805a6ad5 in kdb_trap (type=3, code=0,
tf=0xffffff803ea9e700)
at /usr/src/sys/kern/subr_kdb.c:534
#8 0xffffffff808610e8 in trap (frame=0xffffff803ea9e700)
at /usr/src/sys/amd64/amd64/trap.c:613
#9 0xffffffff8083af97 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:223
#10 0xffffffff805a6cad in kdb_enter (why=0xffffffff8095e234 "panic",
msg=0xa <Address 0xa out of bounds>) at cpufunc.h:63
#11 0xffffffff8057649b in panic (fmt=Variable "fmt" is not available.
)
at /usr/src/sys/kern/kern_shutdown.c:559
#12 0xffffffff80eaa157 in dsl_pool_scrub_setup_sync ()
from /boot/kernel/zfs.ko
#13 0xffffffff80ea562b in dsl_sync_task_group_sync () from /boot/
kernel/zfs.ko
#14 0xffffff00560fb298 in ?? ()
#15 0xffffff803ea9e980 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0xffffff001ef49b48 in ?? ()
#18 0x0000000000000029 in ?? ()
#19 0xffffff00384c4b00 in ?? ()
#20 0xffffff803ea9ea00 in ?? ()
#21 0xffffff803ea9ea40 in ?? ()
#22 0xffffffff80ea5153 in dsl_pool_sync () from /boot/kernel/zfs.ko
Previous frame inner to this frame (corrupt stack?)
Full core.txt: http://pastebin.com/f546fefdf
Regards,
Thomas
PS. Should I file PRs regarding 8-CURRENT or not?
More information about the freebsd-current
mailing list