kernel panic loop after zpool remove
Andrew Chanler
achanler at gmail.com
Thu Jun 18 17:55:34 UTC 2020
I managed to recover the data from this zpool. You can read about it on
the forum if you are curious:
https://forums.FreeBSD.org/threads/kernel-panic-loop-after-zpool-remove.75632/post-466625
Andrew
On Thu, Jun 4, 2020 at 10:10 AM Andrew Chanler <achanler at gmail.com> wrote:
> Hi,
>
> Is there anyway to get zfs and the zpools into a recovery 'safe-mode'
> where it stops trying to do any manipulations of the zpools? I'm stuck in
> a kernel panic loop after 'zpool remove'. With the recent discussions
> about openzfs I was also considering trying to switch to openzfs but don't
> want to get myself into more trouble! I just need to stop the remove vdev
> operation and this 'metaslab free' code flow from running on the new vdev
> so I can keep the system up long enough to read the data out.
>
> Thank you,
> Andrew
>
>
> The system is running 'FreeBSD 12.1-RELEASE-p5 GENERIC amd64'.
> -------
> panic: solaris assert: ((offset) & ((1ULL << vd->vdev_ashift) - 1)) == 0
> (0x400 == 0x0), file:
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c, line:
> 3593
> cpuid = 1
> time = 1590769420
> KDB: stack backtrace:
> #0 0xffffffff80c1d307 at kdb_backtrace+0x67
> #1 0xffffffff80bd063d at vpanic+0x19d
> #2 0xffffffff80bd0493 at panic+0x43
> #3 0xffffffff82a6922c at assfail3+0x2c
> #4 0xffffffff828a3b83 at metaslab_free_concrete+0x103
> #5 0xffffffff828a4dd8 at metaslab_free+0x128
> #6 0xffffffff8290217c at zio_dva_free+0x1c
> #7 0xffffffff828feb7c at zio_execute+0xac
> #8 0xffffffff80c2fae4 at taskqueue_run_locked+0x154
> #9 0xffffffff80c30e18 at taskqueue_thread_loop+0x98
> #10 0xffffffff80b90c53 at fork_exit+0x83
> #11 0xffffffff81082c2e at fork_trampoline+0xe
> Uptime: 7s
> -------
>
> I have a core dump and can provide more details to help debug the issue
> and open a bug tracker too:
> -------
> (kgdb) bt
> #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:234
> #1 doadump (textdump=<optimized out>) at
> /usr/src/sys/kern/kern_shutdown.c:371
> #2 0xffffffff80bd0238 in kern_reboot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:451
> #3 0xffffffff80bd0699 in vpanic (fmt=<optimized out>, ap=<optimized out>)
> at /usr/src/sys/kern/kern_shutdown.c:877
> #4 0xffffffff80bd0493 in panic (fmt=<unavailable>) at
> /usr/src/sys/kern/kern_shutdown.c:804
> #5 0xffffffff82a6922c in assfail3 (a=<unavailable>, lv=<unavailable>,
> op=<unavailable>, rv=<unavailable>, f=<unavailable>, l=<optimized out>)
> at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91
> #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000,
> offset=137438954496, asize=<optimized out>, checkpoint=0)
> at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593
> #7 0xffffffff828a4dd8 in metaslab_free_dva (spa=<optimized out>,
> checkpoint=0, dva=<optimized out>)
> at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3863
> #8 metaslab_free (spa=<optimized out>, bp=0xfffff800043788a0,
> txg=41924766, now=<optimized out>)
> at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145
> #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070
> #10 0xffffffff828feb7c in zio_execute (zio=0xfffff80004378830) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1786
> #11 0xffffffff80c2fae4 in taskqueue_run_locked (queue=0xfffff80004222800)
> at /usr/src/sys/kern/subr_taskqueue.c:467
> #12 0xffffffff80c30e18 in taskqueue_thread_loop (arg=<optimized out>) at
> /usr/src/sys/kern/subr_taskqueue.c:773
> #13 0xffffffff80b90c53 in fork_exit (callout=0xffffffff80c30d80
> <taskqueue_thread_loop>, arg=0xfffff800041d90b0, frame=0xfffffe004dcf9bc0)
> at /usr/src/sys/kern/kern_fork.c:1065
> #14 <signal handler called>
> (kgdb) frame 6
> #6 0xffffffff828a3b83 in metaslab_free_concrete (vd=0xfffff80004623000,
> offset=137438954496, asize=<optimized out>, checkpoint=0)
> at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:3593
> 3593 VERIFY0(P2PHASE(offset, 1ULL << vd->vdev_ashift));
> (kgdb) p /x offset
> $1 = 0x2000000400
> (kgdb) p /x vd->vdev_ashift
> $2 = 0xc
> (kgdb) frame 9
> #9 0xffffffff8290217c in zio_dva_free (zio=0xfffff80004378830) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:3070
> 3070 metaslab_free(zio->io_spa, zio->io_bp, zio->io_txg,
> B_FALSE);
> (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_removing
> $3 = 0x1
> (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[0]->vdev_ashift
> $4 = 0x9
> (kgdb) p /x zio->io_spa->spa_root_vdev->vdev_child[1]->vdev_ashift
> $5 = 0xc
> (kgdb) frame 8
> #8 metaslab_free (spa=<optimized out>, bp=0xfffff800043788a0,
> txg=41924766, now=<optimized out>)
> at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c:4145
> 4145 metaslab_free_dva(spa, &dva[d],
> checkpoint);
> (kgdb) p /x *bp
> $6 = {blk_dva = {{dva_word = {0x100000002, 0x10000002}}, {dva_word = {0x0,
> 0x0}}, {dva_word = {0x0, 0x0}}}, blk_prop = 0x8000020200010001, blk_pad =
> {0x0,
> 0x0}, blk_phys_birth = 0x0, blk_birth = 0x4, blk_fill = 0x0, blk_cksum
> = {zc_word = {0x0, 0x0, 0x0, 0x0}}}
> -------
>
> Some of the notes on how I got in this state from the forum post (
> https://forums.freebsd.org/threads/kernel-panic-loop-after-zpool-remove.75632/
> ):
>
> I had two 2TB drives in mirror configuration for the last 7 years
> upgrading FreeBSD and zfs as time went by. I finally needed more storage
> and tried to add two 4TB drives as a second vdev mirror:
> 'zpool add storage mirror /dev/ada3 /dev/ada4'
>
> Next the 'zpool status' showed:
> ---
> pool: storage
> state: ONLINE
> status: One or more devices are configured to use a non-native block size.
> Expect reduced performance.
> action: Replace affected devices with devices that support the
> configured block size, or migrate data to a properly configured
> pool.
> scan: scrub repaired 0 in 0 days 08:39:28 with 0 errors on Sat May 9
> 01:19:54 2020
> config:
>
> NAME STATE READ WRITE CKSUM
> storage ONLINE 0 0 0
> mirror-0 ONLINE 0 0 0
> ada1 ONLINE 0 0 0 block
> size: 512B configured, 4096B native
> ada2 ONLINE 0 0 0 block
> size: 512B configured, 4096B native
> mirror-1 ONLINE 0 0 0
> ada3 ONLINE 0 0 0
> ada4 ONLINE 0 0 0
>
> errors: No known data errors
> ---
>
> I should have stopped there, but saw the block size warning and thought I
> would try to fix it. The zdb showed mirror-0 with ashift 9 (512 byte
> alignment) and mirror-1 with ashift 12 (4096 byte alignment). I issued
> 'zpool remove storage mirror-0' and quickly went into a panic reboot loop.
> Rebooting into single user mode, first zfs or zpool command loads the
> driver and it panics again. Powering off the new drives and rebooting it
> does not panic, but it fails to because the zpool is missing a top-level
> vdev (mirror-1).
>
>
>
More information about the freebsd-fs
mailing list