[Bug 223699] ZFS drive loss during write operation causes kernel panic

Thu Nov 16 06:04:26 UTC 2017

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=223699

            Bug ID: 223699
           Summary: ZFS drive loss during write operation causes kernel
                    panic
           Product: Base System
           Version: 11.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: abrahamd at cat.pdx.edu

Environment:
OS: FreeBSD 11.1-RELEASE-p4
Board: Supermicro X10DRH-i
Manufacturer: Silicon Mechanics
RAID Controller: LSI 9341-8i HBA

Description:
Server has an attached storage zpool consisting of 8 disks. (Separate from the
root pool, which is on a different controller.) The storage pool is configured
with RAIDZ2 fault tolerance. zpool status before crash is healthy. When doing
routine zfs setup testing (pulling a disk to verify pool integrity), while a
write operation is in progress to the storage pool, a kernel panic is
experienced. This behavior has been observed to be consistently repeatable.

How-to-repeat:
Boot server with attached storage zpool. Begin a write operation to storage
zpool (we use 'yes > file'). Pull a disk from the storage zpool to simulate
drive loss. Kernel panic follows. (repeated multiple times in succession in our
testing while diagnosing issue.)

Trace follows:
flows01# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
mfi0: I/O error, cmd=0xfffffe000148d760, status=0xc, scsi_status=0
mfi0: sense error 0, sense_key 0, asc 0, ascq 0
mfisyspd0: hard error cmd=write 927680-927765

Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 0b
fault virtual address   = 0x8
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff809b9f74
stack pointer           = 0x28:0xfffffe0f84318930
frame pointer           = 0x28:0xfffffe0f84318970
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq264: mfi0)
trap number             = 12
panic: page fault
cpuid = 11
KDB: stack backtrace:
#0 0xffffffff80aadac7 at kdb_backtrace+0x67
#1 0xffffffff80a6bba6 at vpanic+0x186
#2 0xffffffff80a6ba13 at panic+0x43
#3 0xffffffff80edf832 at trap_fatal+0x322
#4 0xffffffff80edf889 at trap_pfault+0x49
#5 0xffffffff80edf0c6 at trap+0x286
#6 0xffffffff80ec36d1 at calltrap+0x8
#7 0xffffffff80620f2c at mfi_tbolt_complete_cmd+0x13c
#8 0xffffffff80620d94 at mfi_intr_tbolt+0x54
#9 0xffffffff80a321ec at intr_event_execute_handlers+0xec
#10 0xffffffff80a324d6 at ithread_loop+0xd6
#11 0xffffffff80a2f845 at fork_exit+0x85
#12 0xffffffff80ec3c0e at fork_trampoline+0xe
Uptime: 1m1s
Dumping 2498 out of 65230 MB:mfi0: cmd_tbolt 0xfffff8000fa0f880 has invalid
sync_cmd_idx=128 - skipping
..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/ums.ko...Reading symbols from
/usr/lib/debug//boot/kernel/ums.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ums.ko
#0  0xffffffff80a6b98a in doadump (textdump=<value optimized out>) at
/usr/src/sys/kern/kern_shutdown.c:311
311             dumping--;
(kgdb) list *0xffffffff809b9f74
0xffffffff809b9f74 is in g_disk_done (/usr/src/sys/geom/geom_disk.c:252).
247             default:
248                     break;
249             }
250             bp2->bio_inbed++;
251             if (bp2->bio_children == bp2->bio_inbed) {
252                     mtx_unlock(&sc->done_mtx);
253                     bp2->bio_resid = bp2->bio_bcount - bp2->bio_completed;
254                     g_io_deliver(bp2, bp2->bio_error);
255             } else
256                     mtx_unlock(&sc->done_mtx);
Current language:  auto; currently minimal

-- 
You are receiving this mail because:
You are the assignee for the bug.