[Bug 223699] ZFS drive loss during write operation causes kernel panic
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Thu Nov 16 06:04:26 UTC 2017
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223699
Bug ID: 223699
Summary: ZFS drive loss during write operation causes kernel
panic
Product: Base System
Version: 11.1-RELEASE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: abrahamd at cat.pdx.edu
Environment:
OS: FreeBSD 11.1-RELEASE-p4
Board: Supermicro X10DRH-i
Manufacturer: Silicon Mechanics
RAID Controller: LSI 9341-8i HBA
Description:
Server has an attached storage zpool consisting of 8 disks. (Separate from the
root pool, which is on a different controller.) The storage pool is configured
with RAIDZ2 fault tolerance. zpool status before crash is healthy. When doing
routine zfs setup testing (pulling a disk to verify pool integrity), while a
write operation is in progress to the storage pool, a kernel panic is
experienced. This behavior has been observed to be consistently repeatable.
How-to-repeat:
Boot server with attached storage zpool. Begin a write operation to storage
zpool (we use 'yes > file'). Pull a disk from the storage zpool to simulate
drive loss. Kernel panic follows. (repeated multiple times in succession in our
testing while diagnosing issue.)
Trace follows:
flows01# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Unread portion of the kernel message buffer:
mfi0: I/O error, cmd=0xfffffe000148d760, status=0xc, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd0: hard error cmd=write 927680-927765
Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 0b
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff809b9f74
stack pointer = 0x28:0xfffffe0f84318930
frame pointer = 0x28:0xfffffe0f84318970
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (irq264: mfi0)
trap number = 12
panic: page fault
cpuid = 11
KDB: stack backtrace:
#0 0xffffffff80aadac7 at kdb_backtrace+0x67
#1 0xffffffff80a6bba6 at vpanic+0x186
#2 0xffffffff80a6ba13 at panic+0x43
#3 0xffffffff80edf832 at trap_fatal+0x322
#4 0xffffffff80edf889 at trap_pfault+0x49
#5 0xffffffff80edf0c6 at trap+0x286
#6 0xffffffff80ec36d1 at calltrap+0x8
#7 0xffffffff80620f2c at mfi_tbolt_complete_cmd+0x13c
#8 0xffffffff80620d94 at mfi_intr_tbolt+0x54
#9 0xffffffff80a321ec at intr_event_execute_handlers+0xec
#10 0xffffffff80a324d6 at ithread_loop+0xd6
#11 0xffffffff80a2f845 at fork_exit+0x85
#12 0xffffffff80ec3c0e at fork_trampoline+0xe
Uptime: 1m1s
Dumping 2498 out of 65230 MB:mfi0: cmd_tbolt 0xfffff8000fa0f880 has invalid
sync_cmd_idx=128 - skipping
..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/ums.ko...Reading symbols from
/usr/lib/debug//boot/kernel/ums.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ums.ko
#0 0xffffffff80a6b98a in doadump (textdump=<value optimized out>) at
/usr/src/sys/kern/kern_shutdown.c:311
311 dumping--;
(kgdb) list *0xffffffff809b9f74
0xffffffff809b9f74 is in g_disk_done (/usr/src/sys/geom/geom_disk.c:252).
247 default:
248 break;
249 }
250 bp2->bio_inbed++;
251 if (bp2->bio_children == bp2->bio_inbed) {
252 mtx_unlock(&sc->done_mtx);
253 bp2->bio_resid = bp2->bio_bcount - bp2->bio_completed;
254 g_io_deliver(bp2, bp2->bio_error);
255 } else
256 mtx_unlock(&sc->done_mtx);
Current language: auto; currently minimal
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list