[Bug 206109] zpool import of corrupt pool causes system to reboot

Sun Jan 10 18:38:10 UTC 2016

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206109

            Bug ID: 206109
           Summary: zpool import of corrupt pool causes system to reboot
           Product: Base System
           Version: 10.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: emilec at clarotech.co.za

I recently setup a new RAIDZ2 pool with 5 x 4TB Seagate NAS drives using
NAS4Free 10.2.0.2 (revision 2235). I discovered after copying data from an
existing NAS to my new pool that there was some corruption detected. I
attempted to run a scrub, but partway through the system crashed and went into
a boot loop. 

I reloaded NAS4Free and tried to import the pool, but each time it would reboot
the system. I then tried FreeBSD-10.2-RELEASE-amd64-mini-memstick and an import
of the pool would also cause the system to reboot. I could however mount the
pool read-only and access data.

>From the NAS4Free logs I was able to obtain the following when the system
crashed after attempting an import:
Jan  1 16:21:28 nas4free syslogd: kernel boot file is /boot/kernel/kernel
Jan  1 16:21:28 nas4free kernel: Solaris: WARNING: blkptr at 0xfffffe0003a5fa40
DVA 1 has invalid VDEV 16384
Jan  1 16:21:28 nas4free kernel:
Jan  1 16:21:28 nas4free kernel:
Jan  1 16:21:28 nas4free kernel: Fatal trap 12: page fault while in kernel mode
Jan  1 16:21:28 nas4free kernel: cpuid = 1; apic id = 01
Jan  1 16:21:28 nas4free kernel: fault virtual address  = 0x50
Jan  1 16:21:28 nas4free kernel: fault code             = supervisor read data,
page not present
Jan  1 16:21:28 nas4free kernel: instruction pointer    =
0x20:0xffffffff81e79f94
Jan  1 16:21:28 nas4free kernel: stack pointer          =
0x28:0xfffffe0169ef5740
Jan  1 16:21:28 nas4free kernel: frame pointer          =
0x28:0xfffffe0169ef5750
Jan  1 16:21:28 nas4free kernel: code segment           = base 0x0, limit
0xfffff, type 0x1b
Jan  1 16:21:28 nas4free kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Jan  1 16:21:28 nas4free kernel: processor eflags       = interrupt enabled,
resume, IOPL = 0
Jan  1 16:21:28 nas4free kernel: current process                = 6
(txg_thread_enter)
Jan  1 16:21:28 nas4free kernel: trap number            = 12
Jan  1 16:21:28 nas4free kernel: panic: page fault
Jan  1 16:21:28 nas4free kernel: cpuid = 1
Jan  1 16:21:28 nas4free kernel: KDB: stack backtrace:
Jan  1 16:21:28 nas4free kernel: #0 0xffffffff80a86a70 at kdb_backtrace+0x60
Jan  1 16:21:28 nas4free kernel: #1 0xffffffff80a4a1d6 at vpanic+0x126
Jan  1 16:21:28 nas4free kernel: #2 0xffffffff80a4a0a3 at panic+0x43
Jan  1 16:21:28 nas4free kernel: #3 0xffffffff80ecaedb at trap_fatal+0x36b
Jan  1 16:21:28 nas4free kernel: #4 0xffffffff80ecb1dd at trap_pfault+0x2ed
Jan  1 16:21:28 nas4free kernel: #5 0xffffffff80eca87a at trap+0x47a
Jan  1 16:21:28 nas4free kernel: #6 0xffffffff80eb0c72 at calltrap+0x8
Jan  1 16:21:28 nas4free kernel: #7 0xffffffff81e8071f at
vdev_mirror_child_select+0x6f
Jan  1 16:21:28 nas4free kernel: #8 0xffffffff81e802d0 at
vdev_mirror_io_start+0x270
Jan  1 16:21:28 nas4free kernel: #9 0xffffffff81e9cd86 at
zio_vdev_io_start+0x1d6
Jan  1 16:21:28 nas4free kernel: #10 0xffffffff81e998b2 at zio_execute+0x162
Jan  1 16:21:28 nas4free kernel: #11 0xffffffff81e991b9 at zio_nowait+0x49
Jan  1 16:21:28 nas4free kernel: #12 0xffffffff81e1c91e at arc_read+0x8fe
Jan  1 16:21:28 nas4free kernel: #13 0xffffffff81e577b2 at
dsl_scan_prefetch+0xc2
Jan  1 16:21:28 nas4free kernel: #14 0xffffffff81e574a3 at
dsl_scan_visitbp+0x583
Jan  1 16:21:28 nas4free kernel: #15 0xffffffff81e5722f at
dsl_scan_visitbp+0x30f
Jan  1 16:21:28 nas4free kernel: #16 0xffffffff81e5722f at
dsl_scan_visitbp+0x30f
Jan  1 16:21:28 nas4free kernel: Copyright (c) 1992-2015 The FreeBSD Project.

status of pool after read-only import:
zpool import -F -f -o readonly=on -R /pool0 pool0
zpool status
  pool: pool0
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub in progress since Wed Dec 30 13:34:03 2015
        1.06T scanned out of 8.53T at 1/s, (scan is slow, no estimated time)
        0 repaired, 12.45% done
config:

        NAME        STATE     READ WRITE CKSUM
        pool0       ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0

errors: 1 data errors, use '-v' for a list

I eventually discovered that the corruption was caused by faulty RAM (fails
memtest). So I accept the pool is corrupt.

Seeing as NAS4Free relies on FreeBSD and the behaviour is the same I thought
this would be the best place to log a bug, but feel free to point me back to
NAS4Free. Their forums however suggested that ZFS is enterprise and enterprise
would simply restore from backup. I believe it would be nice to rather catch
the exception and print an error rather than reboot the system.

-- 
You are receiving this mail because:
You are the assignee for the bug.